IA-32 Intel® Architecture Optimization
3-20
Stack and Data Alignment
To get the most performance out of code written for SIMD technologies
data should be formatted in memory according to the guidelines
described in this section. Assembly code with an unaligned accesses is a
lot slower than an aligned access.
Alignment and Contiguity of Data Access Patterns
The 64-bit packed data types defined by MMX technology, and the
128-bit packed data types for Streaming SIMD Extensions and
Streaming SIMD Extensions 2 create more potential for misaligned data
accesses. The data access patterns of many algorithms are inherently
misaligned when using MMX technology and Streaming SIMD
Extensions. Several techniques for improving data access, such as
padding, organizing data elements into arrays, etc. are described below.
SSE3 provides a special-purpose instruction LDDQU that can avoid
cache line splits is discussed in “Supplemental Techniques for Avoiding
Cache Line Splits” in Chapter 4.
Using Padding to Align Data
However, when accessing SIMD data using SIMD operations, access to
data can be improved simply by a change in the declaration. For
example, consider a declaration of a structure, which represents a point
in space plus an attribute.
typedef struct { short x,y,z; char a} Point;
Point pt[N];
Assume we will be performing a number of computations on
x
,
y
,
z
in
three of the four elements of a SIMD word; see the “Data Structure
Layout” section for an example. Even if the first element in array
pt
is
aligned, the second element will start 7 bytes later and not be aligned (3
shorts at two bytes each plus a single byte = 7 bytes).
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...