General Optimization Guidelines
2
2-91
Using memory as a destination operand may further reduce register
pressure at the slight risk of making trace cache packing more difficult.
On the Pentium 4 processor, the sequence of loading a value from
memory into a register and adding the results in a register to memory is
faster than the alternate sequence of adding a value from memory to a
register and storing the results in a register to memory. The first
sequence also uses one less
μ
op than the latter.
Assembly/Compiler Coding Rule 59. (ML impact, M generality) Give
preference to adding a register to memory (memory is the destination) instead
of adding memory to a register. Also, give preference to adding a register to
memory over loading the memory, adding two registers and storing the result.
Assembly/Compiler Coding Rule 60. (M impact, M generality) When an
address of a store is unknown, subsequent loads cannot be scheduled to
execute out of order ahead of the store, limiting the out of order execution of
the processor. When an address of a store is computed by a potentially long
latency operation (such as a load that might miss the data cache) attempt to
reorder subsequent loads ahead of the store.
Instruction Scheduling
Ideally, scheduling or pipelining should be done in a way that optimizes
performance across all processor generations. This section presents
scheduling rules that can improve the performance of your code on the
Pentium 4 processor.
Latencies and Resource Constraints
Assembly/Compiler Coding Rule 61. (M impact, MH generality) Calculate
store addresses as early as possible to avoid having stores block loads.
Example 2-25 Recombining LOAD/OP Code into REG,MEM Form
LOAD reg1, mem1
... code that does not write to reg1...
OP reg2,
reg1
... code that does not use reg1 ...
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...