
5-1
5
Optimizing for SIMD
Floating-point Applications
This chapter discusses general rules of optimizing for the
single-instruction, multiple-data (SIMD) floating-point instructions
available in Streaming SIMD Extensions (SSE), Streaming SIMD
Extensions 2 (SSE2)and Streaming SIMD Extensions 3 (SSE3). This
chapter also provides examples that illustrate the optimization
techniques for single-precision and double-precision SIMD
floating-point applications.
General Rules for SIMD Floating-point Code
The rules and suggestions listed in this section help optimize
floating-point code containing SIMD floating-point instructions.
Generally, it is important to understand and balance port utilization to
create efficient SIMD floating-point code. The basic rules and
suggestions include the following:
•
Follow all guidelines in Chapter 2 and Chapter 3.
•
Exceptions: mask exceptions to achieve higher performance. When
exceptions are unmasked, software performance is slower.
•
Utilize the flush-to-zero and denormals-are-zero modes for higher
performance to avoid the penalty of dealing with denormals and
underflows.
•
Incorporate the prefetch instruction where appropriate (for details,
refer to Chapter 6, “Optimizing Cache Usage”).
•
Use MMX technology instructions and registers if the computations
can be done in SIMD integer for shuffling data.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...