Benchmarking - motivation and tools (1/2)
One of the main motivations for this project was to leverage the computational
power of a Graphics Processing Unit (GPU) to perform faster computations with
QuTiP. However, even though overall GPUs are rated with higher floating point
operations per second (FLOPS) than CPUs, it is not straightforward to use this
computational power for our advantage. In particular, GPUs require highly
parallelizable operations. This is why during the first weeks of the project I
focused on preparing a set of benchmarks that will help us understanding when
and how to make use of the GPUs. I was specially interested to see if my own
hardware could benefit from using a GPU and how the hardware provided by colab
compares with it. The final goal is to provide an easy function, qutip_tensorflow.benchmarks()
that allows the user to test qutip-tensorflow’s performance in its own hardware.
In this post, I will explain which tools are available for benchmarking in
python and in the next post I will show some of the results obtained in the
benchmarks.
There are several approaches that can be followed to write benchmarks in python.
The simplest one would be to use the time
module. This module is a great tool
for quick benchmarking of a function, but using it would require a lot of
boilerplate code for both parametrizing the benchmarks and saving the results.
Notice that I am mostly interested in comparing the performance of a function
for several matrix sizes and different data representations (for example,
NumPy’s ndarray
, QuTiP’s new Dense
representation or TensorFlow’s
Tensors
). Being able to seamlessly parametrize the benchmarks would
greatly simplify the writing process. This is why I decided to use more sophisticated tools such
as pytest-benchmark
or asv
. Both of these tools automatically store the
results in JSON format and have included an easy way to parametrize the
benchmarks, which I will explain a little bit more in detail now.
Pytest-benchmark
pytest
is a popular python package for testing purposes for which
several plugins are available. One of these is pytest-benchmark
which provides
benchmarking functionality. What I found most interesting about pytest
is that
test (or a benchmark in pytest-benchmark
) can be parametrized using
decorators. An example of this is:
import numpy as np
import pytest
@pytest.mark.parametrize("size", np.logspace(1, 3, 5, dtype=int).tolist())
def test_add(benchmark, size):
# Create a random matrix
a = np.random.random((size,size))
# benchmark a+a
benchmark(a.__add__, a)
In the above code we parametrize the function to benchmark the addition of
two NumPy matrices as a function of the matrix size. This is achieved with the
decorator @pytest.mark.parametrize("size", np.logspace(1, 3, 5, dtype=int).tolist())
.
Airspeed velocity (asv
)
asv
defines itself as “a tool for benchmarking python package over its
lifetime”. Indeed, it includes very useful tools to test a package for
regression, such as running the benchmarks over a range of commits with a single
command (for example, asv run master..mybranch
would run the benchmarks for
the commits since branching off master). It also provides parametrization of the
benchmarks. An example of the above code in asv
would be:
import numpy as np
class TimeLA:
"""
Minimal linear algebra benchmark.
"""
params = np.logspace(1, 3, 5, dtype=int).tolist() # Matrix sizes
# Run this before benchmarking
def setup(self, size):
# Create a random matrix
self.a = np.random.random((size,size))
# benchmark a+a
def time_add(self):
a = self.a
_ = a + a
Another useful feature of asv
is that it can easily generate an html page with
the benchmark results. You can see an example of such page
here for NumPy.
Conclusions
Both asv
and pytest-benchmark
are great tools that would fit my requirements
to write benchmarks: they can be easily parametrized and store the results in
JSON format. asv
has the advantage of providing extra tools that facilitates
benchmarking a code over its lifetime. However, I am mostly interested in on
comparing the performance of different functions for different input sizes.
Hence, I decided to use pytest-benchmark
as I am already used to using pytest
for testing purposes. However, if the goal of the project had been to ensure no
regression occur in the future of this package, I would have chosen asv
as the
tool for benchmarking.
In the next post I will show a comparison of the benchmark results. Hope to see you there!