
Optimizing for SIMD Floating-point Applications
5
5-3
•
Is the data arranged for efficient utilization of the SIMD
floating-point registers?
•
Is this application targeted for processors without SIMD
floating-point instructions?
For more details, see the section on “Considerations for Code
Conversion to SIMD Programming” in Chapter 3.
Using SIMD Floating-point with x87 Floating-point
Because the XMM registers used for SIMD floating-point computations
are separate registers and are not mapped onto the existing x87
floating-point stack, SIMD floating-point code can be mixed with either
x87 floating-point or 64-bit SIMD integer code.
Scalar Floating-point Code
There are SIMD floating-point instructions that operate only on the
least-significant operand in the SIMD register. These instructions are
known as scalar instructions. They allow the XMM registers to be used
for general-purpose floating-point computations.
In terms of performance, scalar floating-point code can be equivalent to
or exceed x87 floating-point code, and has the following advantages:
•
SIMD floating-point code uses a flat register model, whereas x87
floating-point code uses a stack model. Using scalar floating-point
code eliminates the need to use
fxch
instructions, which has some
performance limit on the Intel Pentium 4 processor.
•
Mixing with MMX technology code without penalty.
•
Flush-to-zero mode.
•
Shorter latencies than x87 floating-point.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...