General Optimization Guidelines
2
2-43
If for some reason it is not possible to align the stack for 64-bits, the
routine should access the parameter and save it into a register or known
aligned storage, thus incurring the penalty only once.
Capacity Limits and Aliasing in Caches
There are cases where addresses with a given stride will compete for
some resource in the memory hierarchy.
Typically, caches are implemented to have multiple ways of set
associativity, with each way consisting of multiple sets of cache lines (or
sectors in some cases). Multiple memory references that compete for the
same set of each way in a cache can cause a capacity issue. There are
aliasing conditions that apply to specific microarchitectures. Note that
first-level cache lines are 64 bytes. Thus the least significant 6 bits are
not considered in alias comparisons. For the Pentium 4 and Intel Xeon
processors, data is loaded into the second level cache in a sector of
128 bytes, so the least significant 7 bits are not considered in alias
comparisons.
Example 2-20 Dynamic Stack Alignment
prologue:
subl
esp, 4
; save frame ptr
movl
[esp], ebp
movl
ebp, esp
; new frame pointer
andl
ebp, 0xFFFFFFFC ; aligned to 64 bits
movl
[ebp], esp
; save old stack ptr
subl
esp, FRAMESIZE ; allocate space
; ... callee saves, etc.
epilogue:
; ... callee restores, etc.
movl
esp, [ebp]
; restore stack ptr
movl
ebp, [esp]
; restore frame ptr
addl
esp, 4
ret
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...