Helgrind: a thread error detector
Parent thread:
Child thread:
int var;
// create child thread
pthread_create(...)
var = 20;
// send message to child
// wait for message to arrive
var = 10;
exit
// wait for child
pthread_join(...)
printf("%d\n", var);
Now the program reliably prints "10", regardless of the speed of the threads. Why? Because the child’s assignment
cannot happen until after it receives the message. And the message is not sent until after the parent’s assignment is
done.
The message transmission creates a "happens-before" dependency between the two assignments:
var = 20;
must
now happen-before
var = 10;
. And so there is no longer a race on
var
.
Note that it’s not significant that the parent sends a message to the child.
Sending a message from the child (after
its assignment) to the parent (before its assignment) would also fix the problem, causing the program to reliably print
"20".
Helgrind’s algorithm is (conceptually) very simple. It monitors all accesses to memory locations. If a location -- in
this example,
var
, is accessed by two different threads, Helgrind checks to see if the two accesses are ordered by the
happens-before relation. If so, that’s fine; if not, it reports a race.
It is important to understand that the happens-before relation creates only a partial ordering, not a total ordering. An
example of a total ordering is comparison of numbers: for any two numbers
x
and
y
, either
x
is less than, equal to,
or greater than
y
. A partial ordering is like a total ordering, but it can also express the concept that two elements are
neither equal, less or greater, but merely unordered with respect to each other.
In the fixed example above, we say that
var = 20;
"happens-before"
var = 10;
.
But in the original version,
they are unordered: we cannot say that either happens-before the other.
What does it mean to say that two accesses from different threads are ordered by the happens-before relation?
It
means that there is some chain of inter-thread synchronisation operations which cause those accesses to happen in a
particular order, irrespective of the actual rates of progress of the individual threads. This is a required property for a
reliable threaded program, which is why Helgrind checks for it.
The happens-before relations created by standard threading primitives are as follows:
• When a mutex is unlocked by thread T1 and later (or immediately) locked by thread T2, then the memory accesses
in T1 prior to the unlock must happen-before those in T2 after it acquires the lock.
• The same idea applies to reader-writer locks, although with some complication so as to allow correct handling of
reads vs writes.
112