General Optimization Guidelines
2
2-7
•
Avoid longer latency instructions: integer multiplies and divides.
Replace them with alternate code sequences (e.g., use shifts instead
of multiplies).
•
Use the
lea
instruction and the full range of addressing modes to do
address calculation.
•
Some types of stores use more µops than others, try to use simpler
store variants and/or reduce the number of stores.
•
Avoid use of complex instructions that require more than 4 µops.
•
Avoid instructions that unnecessarily introduce dependence-related
stalls:
inc
and
dec
instructions, partial register operations (8/16-bit
operands).
•
Avoid use of
ah
,
bh
, and other higher 8-bits of the 16-bit registers,
because accessing them requires a shift operation internally.
•
Use
xor
and
pxor
instructions to clear registers and break
dependencies for integer operations; also use
xorps
and
xorpd
to
clear XMM registers for floating-point operations.
•
Use efficient approaches for performing comparisons.
Optimize Instruction Scheduling
•
Consider latencies and resource constraints.
•
Calculate store addresses as early as possible.
Enable Vectorization
•
Use the smallest possible data type. This enables more parallelism
with the use of a longer vector.
•
Arrange the nesting of loops so the innermost nesting level is free of
inter-iteration dependencies. It is especially important to avoid the
case where the store of data in an earlier iteration happens lexically
after the load of that data in a future iteration (called
lexically-backward dependence).
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...