4-1
4
Optimizing for SIMD Integer
Applications
The SIMD integer instructions provide performance improvements in
applications that are integer-intensive and can take advantage of the
SIMD architecture of Pentium 4, Intel Xeon, and Pentium M processors.
The guidelines for using these instructions in addition to the guidelines
described in Chapter 2, will help develop fast and efficient code that
scales well across all processors with MMX technology, processors that
use Streaming SIMD Extensions (SSE) SIMD integer instructions, as
well as processor with the SIMD integer instructions in SSE2 and SSE3.
For the sake of brevity, the collection of 64-bit and 128-bit SIMD
integer instructions supported by MMX technology, SSE, SSE2, and
SSE3 shall be referred to as SIMD integer instructions.
Unless otherwise noted, the following sequences are written for the
64-bit integer registers. Note that they can easily be adapted to use the
128-bit SIMD integer form available with SSE2 by replacing the
references to
mm0
-
mm7
with references to
xmm0
-
xmm7
, and including any
pre-arrangement of data alignment on 16 byte boundary when dealing
with loading or storing 16 bytes of data in some cases.
This chapter contains several simple examples that will help you to get
started with coding your application. The goal is to provide simple,
low-level operations that are frequently used. The examples use a
minimum number of instructions necessary to achieve best performance
on the current generation of IA-32 processors.
Each example includes a short description, sample code, and notes if
necessary. These examples do not address scheduling as it is assumed
the examples will be incorporated in longer code sequences.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...