v
Eliminating Branches...................................................................................................... 2-15
Spin-Wait and Idle Loops................................................................................................ 2-18
Static Prediction.............................................................................................................. 2-19
Inlining, Calls and Returns ............................................................................................. 2-22
Branch Type Selection ................................................................................................... 2-23
Loop Unrolling ............................................................................................................... 2-26
Compiler Support for Branch Prediction ......................................................................... 2-28
Alignment ....................................................................................................................... 2-29
Store Forwarding ............................................................................................................ 2-32
Store-to-Load-Forwarding Restriction on Size and Alignment.................................. 2-33
Store-forwarding Restriction on Data Availability ...................................................... 2-38
Data Layout Optimizations ............................................................................................. 2-39
Stack Alignment.............................................................................................................. 2-42
Capacity Limits and Aliasing in Caches.......................................................................... 2-43
Capacity Limits in Set-Associative Caches............................................................... 2-44
Aliasing Cases in the Pentium
4 and Intel
®
Xeon
®
Processors ............................. 2-45
Write Combining ............................................................................................................. 2-48
Locality Enhancement .................................................................................................... 2-50
Minimizing Bus Latency.................................................................................................. 2-52
Non-Temporal Store Bus Traffic ..................................................................................... 2-53
Prefetching ..................................................................................................................... 2-55
Cacheability Instructions ................................................................................................ 2-56
Code Alignment .............................................................................................................. 2-57
Guidelines for Optimizing Floating-point Code ............................................................... 2-58
Floating-point Modes and Exceptions ............................................................................ 2-60
Floating-point Exceptions ......................................................................................... 2-60
Floating-point Modes ................................................................................................ 2-62
Improving Parallelism and the Use of FXCH .................................................................. 2-68
x87 vs. Scalar SIMD Floating-point Trade-offs ............................................................... 2-69
Scalar SSE/SSE2 Performance on Intel Core Solo and Intel Core Duo
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...