Coding for SIMD Architectures
3
3-27
Improving Memory Utilization
Memory performance can be improved by rearranging data and
algorithms for SSE 2, SSE, and MMX technology intrinsics. The
methods for improving memory performance involve working with the
following:
•
Data structure layout
•
Strip-mining for vectorization and memory utilization
•
Loop-blocking
Using the cacheability instructions, prefetch and streaming store, also
greatly enhance memory utilization. For these instructions, see
Chapter 6, “Optimizing Cache Usage.”
Data Structure Layout
For certain algorithms, like 3D transformations and lighting, there are
two basic ways of arranging the vertex data. The traditional method is
the array of structures (AoS) arrangement, with a structure for each
vertex (see Example 3-14). However this method does not take full
advantage of the SIMD technology capabilities.
The best processing method for code using SIMD technology is to
arrange the data in an array for each coordinate (see Example 3-15).
This data arrangement is called structure of arrays (SoA).
Example 3-14
AoS Data Structure
typedef struct{
float x,y,z;
int a,b,c;
. . .
} Vertex;
Vertex Vertices[NumOfVertices];
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...