Introduced in Catch2 2.9.0.
Writing benchmarks is not easy. Catch simplifies certain aspects but you’ll always need to take care about various aspects. Understanding a few things about the way Catch runs your code will be very helpful when writing your benchmarks.
First off, let’s go over some terminology that will be used throughout this guide.
Now I can explain how a benchmark is executed in Catch. There are three main steps, though the first does not need to be repeated for every benchmark.
Environmental probe: before any benchmarks can be executed, the clock’s resolution is estimated. A few other environmental artifacts are also estimated at this point, like the cost of calling the clock function, but they almost never have any impact in the results.
Estimation: the user code is executed a few times to obtain an estimate of the amount of runs that should be in each sample. This also has the potential effect of bringing relevant code and data into the caches before the actual measurement starts.
Measurement: all the samples are collected sequentially by performing the number of runs estimated in the previous step for each sample.
This already gives us one important rule for writing benchmarks for Catch: the benchmarks must be repeatable. The user code will be executed several times, and the number of times it will be executed during the estimation step cannot be known beforehand since it depends on the time it takes to execute the code. User code that cannot be executed repeatedly will lead to bogus results or crashes.
Benchmarks can be specified anywhere inside a Catch test case. There
is a simple and a slightly more advanced version of the
BENCHMARK
macro.
Let’s have a look how a naive Fibonacci implementation could be benchmarked:
std::uint64_t Fibonacci(std::uint64_t number) {
return number < 2 ? 1 : Fibonacci(number - 1) + Fibonacci(number - 2);
}
Now the most straight forward way to benchmark this function, is just
adding a BENCHMARK
macro to our test case:
("Fibonacci") {
TEST_CASE(Fibonacci(0) == 1);
CHECK// some more asserts..
(Fibonacci(5) == 8);
CHECK// some more asserts..
// now let's benchmark:
("Fibonacci 20") {
BENCHMARKreturn Fibonacci(20);
};
("Fibonacci 25") {
BENCHMARKreturn Fibonacci(25);
};
("Fibonacci 30") {
BENCHMARKreturn Fibonacci(30);
};
("Fibonacci 35") {
BENCHMARKreturn Fibonacci(35);
};
}
There’s a few things to note: - As BENCHMARK
expands to
a lambda expression it is necessary to add a semicolon after the closing
brace (as opposed to the first experimental version). - The
return
is a handy way to avoid the compiler optimizing away
the benchmark code.
Running this already runs the benchmarks and outputs something similar to:
-------------------------------------------------------------------------------
Fibonacci
-------------------------------------------------------------------------------
C:\path\to\Catch2\Benchmark.tests.cpp(10)
...............................................................................
benchmark name samples iterations est run time
mean low mean high mean
std dev low std dev high std dev
-------------------------------------------------------------------------------
Fibonacci 20 100 416439 83.2878 ms
2 ns 2 ns 2 ns
0 ns 0 ns 0 ns
Fibonacci 25 100 400776 80.1552 ms
3 ns 3 ns 3 ns
0 ns 0 ns 0 ns
Fibonacci 30 100 396873 79.3746 ms
17 ns 17 ns 17 ns
0 ns 0 ns 0 ns
Fibonacci 35 100 145169 87.1014 ms
468 ns 464 ns 473 ns
21 ns 15 ns 34 ns
The simplest use case shown above, takes no arguments and just runs
the user code that needs to be measured. However, if using the
BENCHMARK_ADVANCED
macro and adding a
Catch::Benchmark::Chronometer
argument after the macro,
some advanced features are available. The contents of the simple
benchmarks are invoked once per run, while the blocks of the advanced
benchmarks are invoked exactly twice: once during the estimation phase,
and another time during the execution phase.
("simple"){ return long_computation(); };
BENCHMARK
("advanced")(Catch::Benchmark::Chronometer meter) {
BENCHMARK_ADVANCED();
set_up.measure([] { return long_computation(); });
meter};
These advanced benchmarks no longer consist entirely of user code to
be measured. In these cases, the code to be measured is provided via the
Catch::Benchmark::Chronometer::measure
member function.
This allows you to set up any kind of state that might be required for
the benchmark but is not to be included in the measurements, like making
a vector of random integers to feed to a sorting algorithm.
A single call to Catch::Benchmark::Chronometer::measure
performs the actual measurements by invoking the callable object passed
in as many times as necessary. Anything that needs to be done outside
the measurement can be done outside the call to
measure
.
The callable object passed in to measure
can optionally
accept an int
parameter.
.measure([](int i) { return long_computation(i); }); meter
If it accepts an int
parameter, the sequence number of
each run will be passed in, starting with 0. This is useful if you want
to measure some mutating code, for example. The number of runs can be
known beforehand by calling
Catch::Benchmark::Chronometer::runs
; with this one can set
up a different instance to be mutated by each run.
std::vector<std::string> v(meter.runs());
std::fill(v.begin(), v.end(), test_string());
.measure([&v](int i) { in_place_escape(v[i]); }); meter
Note that it is not possible to simply use the same instance for different runs and resetting it between each run since that would pollute the measurements with the resetting code.
It is also possible to just provide an argument name to the simple
BENCHMARK
macro to get the same semantics as providing a
callable to meter.measure
with int
argument:
("indexed", i){ return long_computation(i); }; BENCHMARK
All of these tools give you a lot mileage, but there are two things that still need special handling: constructors and destructors. The problem is that if you use automatic objects they get destroyed by the end of the scope, so you end up measuring the time for construction and destruction together. And if you use dynamic allocation instead, you end up including the time to allocate memory in the measurements.
To solve this conundrum, Catch provides class templates that let you manually construct and destroy objects without dynamic allocation and in a way that lets you measure construction and destruction separately.
("construct")(Catch::Benchmark::Chronometer meter) {
BENCHMARK_ADVANCEDstd::vector<Catch::Benchmark::storage_for<std::string>> storage(meter.runs());
.measure([&](int i) { storage[i].construct("thing"); });
meter};
("destroy")(Catch::Benchmark::Chronometer meter) {
BENCHMARK_ADVANCEDstd::vector<Catch::Benchmark::destructable_object<std::string>> storage(meter.runs());
for(auto&& o : storage)
.construct("thing");
o.measure([&](int i) { storage[i].destruct(); });
meter};
Catch::Benchmark::storage_for<T>
objects are just
pieces of raw storage suitable for T
objects. You can use
the Catch::Benchmark::storage_for::construct
member
function to call a constructor and create an object in that storage. So
if you want to measure the time it takes for a certain constructor to
run, you can just measure the time it takes to run this function.
When the lifetime of a
Catch::Benchmark::storage_for<T>
object ends, if an
actual object was constructed there it will be automatically destroyed,
so nothing leaks.
If you want to measure a destructor, though, we need to use
Catch::Benchmark::destructable_object<T>
. These
objects are similar to
Catch::Benchmark::storage_for<T>
in that construction
of the T
object is manual, but it does not destroy anything
automatically. Instead, you are required to call the
Catch::Benchmark::destructable_object::destruct
member
function, which is what you can use to measure the destruction time.
Sometimes the optimizer will optimize away the very code that you
want to measure. There are several ways to use results that will prevent
the optimiser from removing them. You can use the volatile
keyword, or you can output the value to standard output or to a file,
both of which force the program to actually generate the value
somehow.
Catch adds a third option. The values returned by any function
provided as user code are guaranteed to be evaluated and not optimised
out. This means that if your user code consists of computing a certain
value, you don’t need to bother with using volatile
or
forcing output. Just return
it from the function. That
helps with keeping the code in a natural fashion.
Here’s an example:
// may measure nothing at all by skipping the long calculation since its
// result is not used
("no return"){ long_calculation(); };
BENCHMARK
// the result of long_calculation() is guaranteed to be computed somehow
("with return"){ return long_calculation(); }; BENCHMARK
However, there’s no other form of control over the optimizer whatsoever. It is up to you to write a benchmark that actually measures what you want and doesn’t just measure the time to do a whole bunch of nothing.
To sum up, there are two simple rules: whatever you would do in handwritten code to control optimization still works in Catch; and Catch makes return values from user code into observable effects that can’t be optimized away.
Adapted from nonius’ documentation.