Benchmarking - motivation and tools (1/2)
One of the main motivations for this project was to leverage the computational
power of a Graphics Processing Unit (GPU) to perform faster computations with
QuTiP. However, even though overall GPUs are rated with higher floating point
operations per second (FLOPS) than CPUs, it is not straightforward to use this
computational power for our advantage. In particular, GPUs require highly
parallelizable operations. This is why during the first weeks of the project I
focused on preparing a set of benchmarks that will help us understanding when
and how to make use of the GPUs. I was specially interested to see if my own
hardware could benefit from using a GPU and how the hardware provided by colab
compares with it. The final goal is to provide an easy function,
that allows the user to test qutip-tensorflow’s performance in its own hardware.
In this post, I will explain which tools are available for benchmarking in
python and in the next post I will show some of the results obtained in the
There are several approaches that can be followed to write benchmarks in python.
The simplest one would be to use the
time module. This module is a great tool
for quick benchmarking of a function, but using it would require a lot of
boilerplate code for both parametrizing the benchmarks and saving the results.
Notice that I am mostly interested in comparing the performance of a function
for several matrix sizes and different data representations (for example,
ndarray, QuTiP’s new
Dense representation or TensorFlow’s
Tensors). Being able to seamlessly parametrize the benchmarks would
greatly simplify the writing process. This is why I decided to use more sophisticated tools such
asv. Both of these tools automatically store the
results in JSON format and have included an easy way to parametrize the
benchmarks, which I will explain a little bit more in detail now.
pytest is a popular python package for testing purposes for which
several plugins are available. One of these is
pytest-benchmark which provides
benchmarking functionality. What I found most interesting about
pytest is that
test (or a benchmark in
pytest-benchmark) can be parametrized using
decorators. An example of this is:
import numpy as np import pytest @pytest.mark.parametrize("size", np.logspace(1, 3, 5, dtype=int).tolist()) def test_add(benchmark, size): # Create a random matrix a = np.random.random((size,size)) # benchmark a+a benchmark(a.__add__, a)
In the above code we parametrize the function to benchmark the addition of
two NumPy matrices as a function of the matrix size. This is achieved with the
@pytest.mark.parametrize("size", np.logspace(1, 3, 5, dtype=int).tolist()).
Airspeed velocity (
asv defines itself as “a tool for benchmarking python package over its
lifetime”. Indeed, it includes very useful tools to test a package for
regression, such as running the benchmarks over a range of commits with a single
command (for example,
asv run master..mybranch would run the benchmarks for
the commits since branching off master). It also provides parametrization of the
benchmarks. An example of the above code in
asv would be:
import numpy as np class TimeLA: """ Minimal linear algebra benchmark. """ params = np.logspace(1, 3, 5, dtype=int).tolist() # Matrix sizes # Run this before benchmarking def setup(self, size): # Create a random matrix self.a = np.random.random((size,size)) # benchmark a+a def time_add(self): a = self.a _ = a + a
Another useful feature of
asv is that it can easily generate an html page with
the benchmark results. You can see an example of such page
here for NumPy.
pytest-benchmark are great tools that would fit my requirements
to write benchmarks: they can be easily parametrized and store the results in
asv has the advantage of providing extra tools that facilitates
benchmarking a code over its lifetime. However, I am mostly interested in on
comparing the performance of different functions for different input sizes.
Hence, I decided to use
pytest-benchmark as I am already used to using
for testing purposes. However, if the goal of the project had been to ensure no
regression occur in the future of this package, I would have chosen
asv as the
tool for benchmarking.
In the next post I will show a comparison of the benchmark results. Hope to see you there!