General Optimization Guidelines
2
2-61
executing SSE/SSE2/SSE3 instructions and when speed is more
important than complying to IEEE standard. The following paragraphs
give recommendations on how to optimize your code to reduce
performance degradations related to floating-point exceptions.
Dealing with floating-point exceptions in x87 FPU code
Every special situation listed in the “Floating-point Exceptions” section
is costly in terms of performance. For that reason, x87 FPU code should
be written to avoid these situations.
There are basically three ways to reduce the impact of
overflow/underflow situations with x87 FPU code:
•
Choose floating-point data types that are large enough to
accommodate results without generating arithmetic overflow and
underflow exceptions.
•
Scale the range of operands/results to reduce as much as possible the
number of arithmetic overflow/underflow situations.
•
Keep intermediate results on the x87 FPU register stack until the
final results have been computed and stored to memory. Overflow
or underflow is less likely to happen when intermediate results are
kept in the x87 FPU stack (this is because data on the stack is stored
in double extended-precision format and overflow/underflow
conditions are detected accordingly).
Denormalized floating-point constants (which are read only, and hence
never change) should be avoided and replaced, if possible, with zeros of
the same sign.
Dealing with Floating-point Exceptions in SSE and SSE2
code
Most special situations that involve masked floating-point exceptions
are handled efficiently on the Pentium 4 processor. When a masked
overflow exception occurs while executing SSE or SSE2 code, the
Pentium 4 processor handles it without performance penalty.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...