In previous post I decsribed process of benchmarking of my library and how I stumbled upon interesting issue about slowness of code with subtraction. I even opened an issue. And @mikedn reminded me very about very important and interesting thing: know what are you going to benchmark. The bigger and more complex your benchmark is, it becomes more and more likely that you’ll measure something else, not what you wanted. E.g. people measure .net GC in this issue mostly, not the actual code of driver.
And I got bitten by almost same thing. Almost the same. You can see results of better designed benchmark below:
Method | Mean | Error | StdDev | Q3 | Scaled | ScaledSD | Allocated |
---|---|---|---|---|---|---|---|
Span | 378.8 ns | 6.778 ns | 6.340 ns | 385.2 ns | 2.01 | 0.04 | 0 B |
SpanConst | 198.7 ns | 1.255 ns | 1.174 ns | 199.5 ns | 1.06 | 0.02 | 0 B |
Pointer | 235.5 ns | 4.583 ns | 4.501 ns | 237.6 ns | 1.25 | 0.03 | 0 B |
C | 188.1 ns | 2.714 ns | 2.538 ns | 190.3 ns | 1.00 | 0.00 | 0 B |
Cpp | 189.5 ns | 2.896 ns | 2.709 ns | 191.5 ns | 1.01 | 0.02 | 0 B |
Here Span
and SpanConst
benchmark serialize one hundred ints using same code, based on Span<T>
, former is using some diffrents integers and latter is using constant. Others retain their names from previous post.
Lets set aside SpanConst
for a moment. Span
result became slower a little, but other became slower a lot, around two times. Why is that? Previous benchmark was about serializing some num
, where num ∈ [1<<30 - 100, 1<<30]
. And new one takes numbers from 99000 till 0 with 1000 step. Now we’re going to look onto mp_encode_uint
method of msgpuck. Methods in other libraries look similar.
In old benchmark GCC 6.0 was able to find out that num ∈ [1<<30 - 100, 1<<30]
even for non-const integer and to eliminate all code, except one branch. In .net core JIT was able to do the same, but only for constant integer. In other cases JIT generated full code with all branches. So, “slow code” was actual code” and “fast” one was an anomaly tied to an ability of compilers eliminate dead code. Developer should take this into account during designing of benchmark. E.g. SpanConst
here is for illustrating elimination of code and proving hypothesis that jit and gcc eliminate code here.
Conclusion: benchmarking is hard, you need to know what you’re measuring and to test that corner cases.
comments powered by Disqus