
IA-32 Intel® Architecture Optimization
2-72
Floating-Point Stalls
Floating-point instructions have a latency of at least two cycles. But,
because of the out-of-order nature of Pentium II and the subsequent
processors, stalls will not necessarily occur on an instruction or µop
basis. However, if an instruction has a very long latency such as an
fdiv
, then scheduling can improve the throughput of the overall
application.
x87 Floating-point Operations with Integer Operands
For Pentium 4 processor, splitting floating-point operations (
fiadd
,
fisub
,
fimul
, and
fidiv
) that take 16-bit integer operands into two
instructions (
fild
and a floating-point operation) is more efficient.
However, for floating-point operations with 32-bit integer operands,
using
fiadd
,
fisub
,
fimul
, and
fidiv
is equally efficient compared
with using separate instructions.
Assembly/Compiler Coding Rule 36. (M impact, L generality) Try to use
32-bit operands rather than 16-bit operands for
fild.
However, do not do so
at the expense of introducing a store forwarding problem by writing the two
halves of the 32-bit memory operand separately.
x87 Floating-point Comparison Instructions
On Pentium II and the subsequent processors, the
fcomi
and
fcmov
instructions should be used when performing floating-point
comparisons. Using (
fcom
,
fcomp
,
fcompp
) instructions typically
requires additional instruction like
fstsw
. The latter alternative causes
more
μ
ops to be decoded, and should be avoided.
Transcendental Functions
If an application needs to emulate math functions in software due to
performance or other reasons (see the “Guidelines for Optimizing
Floating-point Code” section), it may be worthwhile to inline math
library calls because the
call
and the prologue/epilogue involved with
such calls can significantly affect the latency of operations.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...