Cachegrind: a cache and branch-prediction profiler
• It doesn’t account for virtual-to-physical address mappings.
Hence the simulation is not a true representation of
what’s happening in the cache. Most caches and branch predictors are physically indexed, but Cachegrind simulates
caches using virtual addresses.
• It doesn’t account for cache misses not visible at the instruction level, e.g. those arising from TLB misses, or
speculative execution.
• Valgrind will schedule threads differently from how they would be when running natively. This could warp the
results for threaded programs.
• The x86/amd64 instructions
bts
,
btr
and
btc
will incorrectly be counted as doing a data read if both the
arguments are registers, eg:
btsl %eax, %edx
This should only happen rarely.
• x86/amd64 FPU instructions with data sizes of 28 and 108 bytes (e.g.
fsave
) are treated as though they only
access 16 bytes. These instructions seem to be rare so hopefully this won’t affect accuracy much.
Another thing worth noting is that results are very sensitive. Changing the size of the the executable being profiled, or
the sizes of any of the shared libraries it uses, or even the length of their file names, can perturb the results. Variations
will be small, but don’t expect perfectly repeatable results if your program changes at all.
More recent GNU/Linux distributions do address space randomisation, in which identical runs of the same program
have their shared libraries loaded at different locations, as a security measure. This also perturbs the results.
While these factors mean you shouldn’t trust the results to be super-accurate, they should be close enough to be useful.
5.8. Implementation Details
This section talks about details you don’t need to know about in order to use Cachegrind, but may be of interest to
some people.
5.8.1. How Cachegrind Works
The best reference for understanding how Cachegrind works is chapter 3 of "Dynamic Binary Analysis and Instru-
mentation", by Nicholas Nethercote. It is available on the Valgrind publications page.
5.8.2. Cachegrind Output File Format
The file format is fairly straightforward, basically giving the cost centre for every line, grouped by files and functions.
It’s also totally generic and self-describing, in the sense that it can be used for any events that can be counted on a
line-by-line basis, not just cache and branch predictor events. For example, earlier versions of Cachegrind didn’t have
a branch predictor simulation. When this was added, the file format didn’t need to change at all. So the format (and
consequently, cg_annotate) could be used by other tools.
The file format:
91