Helgrind: a thread error detector
In this section, and in general, to "acquire" a lock simply means to lock that lock, and to "release" a lock means to
unlock it.
Helgrind monitors the order in which threads acquire locks. This allows it to detect potential deadlocks which could
arise from the formation of cycles of locks. Detecting such inconsistencies is useful because, whilst actual deadlocks
are fairly obvious, potential deadlocks may never be discovered during testing and could later lead to hard-to-diagnose
in-service failures.
The simplest example of such a problem is as follows.
• Imagine some shared resource R, which, for whatever reason, is guarded by two locks, L1 and L2, which must both
be held when R is accessed.
• Suppose a thread acquires L1, then L2, and proceeds to access R. The implication of this is that all threads in the
program must acquire the two locks in the order first L1 then L2. Not doing so risks deadlock.
• The deadlock could happen if two threads -- call them T1 and T2 -- both want to access R. Suppose T1 acquires
L1 first, and T2 acquires L2 first. Then T1 tries to acquire L2, and T2 tries to acquire L1, but those locks are both
already held. So T1 and T2 become deadlocked.
Helgrind builds a directed graph indicating the order in which locks have been acquired in the past. When a thread
acquires a new lock, the graph is updated, and then checked to see if it now contains a cycle. The presence of a cycle
indicates a potential deadlock involving the locks in the cycle.
In general, Helgrind will choose two locks involved in the cycle and show you how their acquisition ordering has
become inconsistent. It does this by showing the program points that first defined the ordering, and the program points
which later violated it. Here is a simple example involving just two locks:
Thread #1: lock order "0x7FF0006D0 before 0x7FF0006A0" violated
Observed (incorrect) order is: acquisition of lock at 0x7FF0006A0
at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
by 0x400825: main (tc13_laog1.c:23)
followed by a later acquisition of lock at 0x7FF0006D0
at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
by 0x400853: main (tc13_laog1.c:24)
Required order was established by acquisition of lock at 0x7FF0006D0
at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
by 0x40076D: main (tc13_laog1.c:17)
followed by a later acquisition of lock at 0x7FF0006A0
at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
by 0x40079B: main (tc13_laog1.c:18)
When there are more than two locks in the cycle, the error is equally serious. However, at present Helgrind does not
show the locks involved, sometimes because it that information is not available, but also so as to avoid flooding you
with information. For example, here is an example involving a cycle of five locks from a naive implementation the
famous Dining Philosophers problem (see
helgrind/tests/tc14_laog_dinphils.c
). In this case Helgrind
has detected that all 5 philosophers could simultaneously pick up their left fork and then deadlock whilst waiting to
pick up their right forks.
108