Volume 1, Part 2: Memory Reference
1:157
3.5
Optimization of Memory References
Speculation can increase parallelism and help to hide latency by enabling more code
motion than can be performed on traditional architectures. Speculation can increase the
application of traditional loop optimizations such as invariant code motion and common
subexpression elimination. The Itanium architecture also offers post-increment loads
and stores that improve instruction throughput without increasing code size.
Memory reference optimization should take several factors into account including:
• Difference between the execution costs of speculative and non-speculative code.
• Code size.
• Interference probabilities and properties of the ALAT (for data speculation).
The remainder of this chapter discusses these factors and optimizations relating to
memory accesses.
3.5.1
Speculation Considerations
The use of data speculation requires more attention than the use of control speculation.
In part this is due to the fact that one control speculative load cannot inadvertently
cause another control speculative load to fail. Such an effect is possible with data
speculative loads since the ALAT has limited capacity and the replacement policy of
ALAT entries is implementation dependent. For example, if an advanced load is issued
and there are no unused ALAT entries, the hardware may choose to invalidate an
existing entry to make room for a new one.
Moreover, exceptions associated with control speculative calculations are uncommon in
correct code since they are related to events such as page faults and TLB misses.
However, excessive control speculation can be expensive as associated instructions fill
issue slots.
Although the static critical path of a program may be reduced by the use of data
speculation, the following factors contribute to the benefit/dynamic cost of data
speculation:
• The probability that an intervening store will interfere with an advanced load.
• The cost of recovering from a failed advanced load.
• The specific microarchitectural implementation of the ALAT: its size, associativity,
and matching algorithm.
Determining interference probabilities can be difficult, but dynamic memory profiling
can help to predict how often ambiguous loads and stores will conflict.
When using advanced loads, there should be case-by-case consideration as to whether
advancing only a load and using a
ld.c
might be preferable to advancing both a load
and its uses, which would require the use of the potentially more expensive
chk.a
.
Even when recovery code is not executed, its presence extends the lifetimes of
registers used in data and control speculation, thus increasing register pressure and
possibly the cost of register movement by the Register Stack Engine (RSE). See
for information on considerations for recovery code placement.
Содержание ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Страница 1: ......
Страница 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 12: ...1 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I Application Architecture Guide ...
Страница 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Страница 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Страница 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Страница 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Страница 230: ......
Страница 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 249: ...2 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I System Architecture Guide ...
Страница 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Страница 380: ...2 132 Volume 2 Part 1 Interruptions ...
Страница 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Страница 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Страница 749: ...2 501 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part II System Programmer s Guide ...
Страница 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Страница 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Страница 808: ...2 560 Volume 2 Part 2 Context Management ...
Страница 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Страница 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Страница 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Страница 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Страница 891: ......
Страница 941: ...3 42 Volume 3 Instruction Reference cmp illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Operation fault ...
Страница 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Страница 1191: ...3 292 Volume 3 Pseudo Code Functions Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Страница 1296: ......
Страница 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Страница 1564: ...4 262 Volume 4 Base IA 32 Instruction Reference LES Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1565: ...Volume 4 Base IA 32 Instruction Reference 4 263 LFS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1568: ...4 266 Volume 4 Base IA 32 Instruction Reference LGS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1583: ...Volume 4 Base IA 32 Instruction Reference 4 281 LSS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Страница 1663: ...Volume 4 Base IA 32 Instruction Reference 4 361 SHL SHR Shift Instructions See entry for SAL SAR SHL SHR ...
Страница 1668: ...4 366 Volume 4 Base IA 32 Instruction Reference SIDT Store Interrupt Descriptor Table Register See entry for SGDT SIDT ...
Страница 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Страница 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Страница 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...