Using and understanding the Valgrind core
• Valgrind has the following limitations in its implementation of x86/AMD64 SSE2 FP arithmetic, relative to
IEEE754.
Essentially the same: no exceptions, and limited observance of rounding mode. Also, SSE2 has control bits which
make it treat denormalised numbers as zero (DAZ) and a related action, flush denormals to zero (FTZ). Both of
these cause SSE2 arithmetic to be less accurate than IEEE requires. Valgrind detects, ignores, and can warn about,
attempts to enable either mode.
• Valgrind has the following limitations in its implementation of ARM VFPv3 arithmetic, relative to IEEE754.
Essentially the same: no exceptions, and limited observance of rounding mode. Also, switching the VFP unit into
vector mode will cause Valgrind to abort the program -- it has no way to emulate vector uses of VFP at a reasonable
performance level. This is no big deal given that non-scalar uses of VFP instructions are in any case deprecated.
• Valgrind has the following limitations in its implementation of PPC32 and PPC64 floating point arithmetic, relative
to IEEE754.
Scalar (non-Altivec): Valgrind provides a bit-exact emulation of all floating point instructions, except for "fre" and
"fres", which are done more precisely than required by the PowerPC architecture specification. All floating point
operations observe the current rounding mode.
However, fpscr[FPRF] is not set after each operation. That could be done but would give measurable performance
overheads, and so far no need for it has been found.
As on x86/AMD64, IEEE754 exceptions are not supported: all floating point exceptions are handled using the
default IEEE fixup actions. Valgrind detects, ignores, and can warn about, attempts to unmask the 5 IEEE FP
exception kinds by writing to the floating-point status and control register (fpscr).
Vector (Altivec, VMX): essentially as with x86/AMD64 SSE/SSE2: no exceptions, and limited observance of
rounding mode. For Altivec, FP arithmetic is done in IEEE/Java mode, which is more accurate than the Linux
default setting.
"More accurate" means that denormals are handled properly, rather than simply being flushed to
zero.
Programs which are known not to work are:
• emacs starts up but immediately concludes it is out of memory and aborts.
It may be that Memcheck does not
provide a good enough emulation of the
mallinfo
function. Emacs works fine if you build it to use the standard
malloc/free routines.
2.12. An Example Run
This is the log for a run of a small program using Memcheck. The program is in fact correct, and the reported error is
as the result of a potentially serious code generation bug in GNU g++ (snapshot 20010527).
26