Valgrind BBV Quick Start Manual Download Page 107

Page: 107 / 319

Callgrind: a call-graph generating cache and branch prediction profiler

Use

--auto=yes

to get annotated source code for all relevant functions for which the source can be found. In

addition to source annotation as produced by

cg_annotate

, you will see the annotated call sites with call counts.

For all other options, consult the (Cachegrind) documentation for

cg_annotate

For better call graph browsing experience, it is highly recommended to use KCachegrind. If your code has a significant
fraction of its cost in

cycles

(sets of functions calling each other in a recursive manner), you have to use KCachegrind,

callgrind_annotate

currently does not do any cycle detection, which is important to get correct results in

this case.

If you are additionally interested in measuring the cache behavior of your program, use Callgrind with the option

--cache-sim

=yes

. For branch prediction simulation, use

--branch-sim

=yes

. Expect a further slow down

approximately by a factor of 2.

If the program section you want to profile is somewhere in the middle of the run, it is beneficial to

fast forward

this section without any profiling, and then enable profiling.

This is achieved by using the command line option

--instr-atstart

=no

and running, in a shell:

callgrind_control -i on

just before the interesting

code section is executed. To exactly specify the code position where profiling should start, use the client request

CALLGRIND_START_INSTRUMENTATION

If you want to be able to see assembly code level annotation, specify

--dump-instr

=yes

. This will produce

profile data at instruction granularity. Note that the resulting profile data can only be viewed with KCachegrind. For
assembly annotation, it also is interesting to see more details of the control flow inside of functions, i.e. (conditional)
jumps. This will be collected by further specifying

--collect-jumps

=yes

6.2. Advanced Usage

6.2.1. Multiple profiling dumps from one program run

Sometimes you are not interested in characteristics of a full program run, but only of a small part of it, for example
execution of one algorithm.

If there are multiple algorithms, or one algorithm running with different input data, it

may even be useful to get different profile information for different parts of a single program run.

Profile data files have names of the form

callgrind.out.

pid

part

threadID

where

pid

is the PID of the running program,

part

is a number incremented on each dump (".part" is skipped for the

dump at program termination), and

threadID

is a thread identification ("-threadID" is only used if you request dumps

of individual threads with

--separate-threads

=yes

There are different ways to generate multiple profile dumps while a program is running under Callgrind’s supervision.
Nevertheless, all methods trigger the same action, which is "dump all profile information since the last dump or
program start, and zero cost counters afterwards".

To allow for zeroing cost counters without dumping, there is a

second action "zero all cost counters now". The different methods are:

•

Dump on program termination.

This method is the standard way and doesn’t need any special action on your

part.