
General Optimization Guidelines
2
2-29
Memory Accesses
This section discusses guidelines for optimizing code and data memory
accesses. The most important recommendations are:
•
align data, paying attention to data layout and stack alignment
•
enable store forwarding
•
place code and data on separate pages
•
enhance data locality
•
use prefetching and cacheability control instructions
•
enhance code locality and align branch targets
•
take advantage of write combining
Alignment and forwarding problems are among the most common
sources of large delays on the Pentium 4 processor.
Alignment
Alignment of data concerns all kinds of variables:
•
dynamically allocated
•
members of a data structure
•
global or local variables
•
parameters passed on the stack
Misaligned data access can incur significant performance penalties. This
is particularly true for cache line splits. The size of a cache line is
64 bytes in the Pentium 4, Intel Xeon, and Pentium M processors.
On the Pentium 4 processor, an access to data unaligned on 64-byte
boundary leads to two memory accesses and requires several µops to be
executed (instead of one). Accesses that span 64-byte boundaries are
likely to incur a large performance penalty, since they are executed near
retirement, and can incur stalls that are on the order of the depth of the
pipeline.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...