Using and understanding the Valgrind core
On Linux, Valgrind also supports direct use of the
clone
system call,
futex
and so on.
clone
is supported where
either everything is shared (a thread) or nothing is shared (fork-like); partial sharing will fail.
2.7.1. Scheduling and Multi-Thread Performance
A thread executes code only when it holds the abovementioned lock. After executing some number of instructions,
the running thread will release the lock. All threads ready to run will then compete to acquire the lock.
The
--fair-sched
option controls the locking mechanism used to serialise thread execution.
The default pipe based locking mechanism (
--fair-sched=no
) is available on all platforms. Pipe based locking
does not guarantee fairness between threads: it is quite likely that a thread that has just released the lock reacquires it
immediately, even though other threads are ready to run. When using pipe based locking, different runs of the same
multithreaded application might give very different thread scheduling.
An alternative locking mechanism, based on futexes, is available on some platforms.
If available, it is activated by
--fair-sched=yes
or
--fair-sched=try
.
Futex based locking ensures fairness (round-robin scheduling)
between threads: if multiple threads are ready to run, the lock will be given to the thread which first requested the
lock. Note that a thread which is blocked in a system call (e.g. in a blocking read system call) has not (yet) requested
the lock: such a thread requests the lock only after the system call is finished.
The fairness of the futex based locking produces better reproducibility of thread scheduling for different executions of
a multithreaded application. This better reproducibility is particularly helpful when using Helgrind or DRD.
Valgrind’s use of thread serialisation implies that only one thread at a time may run. On a multiprocessor/multicore
system, the running thread is assigned to one of the CPUs by the OS kernel scheduler. When a thread acquires the
lock, sometimes the thread will be assigned to the same CPU as the thread that just released the lock. Sometimes, the
thread will be assigned to another CPU. When using pipe based locking, the thread that just acquired the lock will
usually be scheduled on the same CPU as the thread that just released the lock. With the futex based mechanism, the
thread that just acquired the lock will more often be scheduled on another CPU.
Valgrind’s thread serialisation and CPU assignment by the OS kernel scheduler can interact badly with the CPU
frequency scaling available on many modern CPUs.
To decrease power consumption, the frequency of a CPU or
core is automatically decreased if the CPU/core has not been used recently. If the OS kernel often assigns the thread
which just acquired the lock to another CPU/core, it is quite likely that this CPU/core is currently at a low frequency.
The frequency of this CPU will be increased after some time. However, during this time, the (only) running thread
will have run at the low frequency. Once this thread has run for some time, it will release the lock. Another thread
will acquire this lock, and might be scheduled again on another CPU whose clock frequency was decreased in the
meantime.
The futex based locking causes threads to change CPUs/cores more often. So, if CPU frequency scaling is activated,
the futex based locking might decrease significantly the performance of a multithreaded app running under Valgrind.
Performance losses of up to 50% degradation have been observed, as compared to running on a machine for which
CPU frequency scaling has been disabled.
The pipe based locking locking scheme also interacts badly with CPU
frequency scaling, with performance losses in the range 10..20% having been observed.
To avoid such performance degradation, you should indicate to the kernel that all CPUs/cores should always run at
maximum clock speed.
Depending on your Linux distribution, CPU frequency scaling may be controlled using a
graphical interface or using command line such as
cpufreq-selector
or
cpufreq-set
.
An alternative way to avoid these problems is to tell the OS scheduler to tie a Valgrind process to a specific (fixed)
CPU using the
taskset
command.
This should ensure that the selected CPU does not fall below its maximum
frequency setting so long as any thread of the program has work to do.
22