General Optimization Guidelines
2
2-15
Branch Prediction
Branch optimizations have a significant impact on performance. By
understanding the flow of branches and improving the predictability of
branches, you can increase the speed of code significantly.
Optimizations that help branch prediction are:
•
Keep code and data on separate pages (a very important item, see
more details in the “Memory Accesses” section).
•
Whenever possible, eliminate branches.
•
Arrange code to be consistent with the static branch prediction
algorithm.
•
Use the
pause
instruction in spin-wait loops.
•
Inline functions and pair up calls and returns.
•
Unroll as necessary so that repeatedly-executed loops have sixteen
or fewer iterations, unless this causes an excessive code size
increase.
•
Separate branches so that they occur no more frequently than every
three
μ
ops where possible.
Eliminating Branches
Eliminating branches improves performance because it:
•
reduces the possibility of mispredictions
•
reduces the number of required branch target buffer (BTB) entries;
conditional branches, which are never taken, do not consume BTB
resources
There are four principal ways of eliminating branches:
•
arrange code to make basic blocks contiguous
•
unroll loops, as discussed in the “Loop Unrolling” section
•
use the
cmov
instruction
•
use the
setcc
instruction
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...