Know what are you going to benchmark

In previous post I decsribed process of benchmarking of my library and how I stumbled upon interesting issue about slowness of code with subtraction. I even opened an issue. And @mikedn reminded me very about very important and interesting thing: know what are you going to benchmark. The bigger and more complex your benchmark is, it becomes more and more likely that you’ll measure something else, not what you wanted. E.g. people measure .net GC in this issue mostly, not the actual code of driver.

And I got bitten by almost same thing. Almost the same. You can see results of better designed benchmark below:

Method	Mean	Error	StdDev	Q3	Scaled	ScaledSD
Span	378.8 ns	6.778 ns	6.340 ns	385.2 ns	2.01	0.04
SpanConst	198.7 ns	1.255 ns	1.174 ns	199.5 ns	1.06	0.02
Pointer	235.5 ns	4.583 ns	4.501 ns	237.6 ns	1.25	0.03
C	188.1 ns	2.714 ns	2.538 ns	190.3 ns	1.00	0.00
Cpp	189.5 ns	2.896 ns	2.709 ns	191.5 ns	1.01	0.02

Here Span and SpanConst benchmark serialize one hundred ints using same code, based on Span<T>, former is using some diffrents integers and latter is using constant. Others retain their names from previous post.

Lets set aside SpanConst for a moment. Span result became slower a little, but other became slower a lot, around two times. Why is that? Previous benchmark was about serializing some num, where num ∈ [1<<30 - 100, 1<<30]. And new one takes numbers from 99000 till 0 with 1000 step. Now we’re going to look onto mp_encode_uint method of msgpuck. Methods in other libraries look similar.

MP_IMPL char *
mp_encode_uint(char *data, uint64_t num)
{
    if (num <= 0x7f) {
        return mp_store_u8(data, num);
    } else if (num <= UINT8_MAX) {
        data = mp_store_u8(data, 0xcc);
        return mp_store_u8(data, num);
    } else if (num <= UINT16_MAX) {
        data = mp_store_u8(data, 0xcd);
        return mp_store_u16(data, num);
    } else if (num <= UINT32_MAX) {
        data = mp_store_u8(data, 0xce);
        return mp_store_u32(data, num);
    } else {
        data = mp_store_u8(data, 0xcf);
        return mp_store_u64(data, num);
    }
}

In old benchmark GCC 6.0 was able to find out that num ∈ [1<<30 - 100, 1<<30] even for non-const integer and to eliminate all code, except one branch. In .net core JIT was able to do the same, but only for constant integer. In other cases JIT generated full code with all branches. So, “slow code” was actual code” and “fast” one was an anomaly tied to an ability of compilers eliminate dead code. Developer should take this into account during designing of benchmark. E.g. SpanConst here is for illustrating elimination of code and proving hypothesis that jit and gcc eliminate code here.

Conclusion: benchmarking is hard, you need to know what you’re measuring and to test that corner cases.

Developer stories

Stories from the life of a software developer

Know what are you going to benchmark