Volume 1, Part 2: Software Pipelining and Loop Support
1:197
5.5.5.2
Conflicts in the ALAT
Using an advanced load to remove a likely invariant load from a loop while advancing
another load inside the loop results in poor performance if the latter load targets a
rotating register. The advanced load that targets the rotating register will eventually
invalidate the ALAT entry for the loop invariant load. Thereafter, every execution of the
check load for the loop invariant load will cause an ALAT miss.
When more than one advanced load in the loop targets a rotating register, the registers
must be assigned and the register lifetimes controlled so that the check load for a
particular advanced load X is executed before any of the other advanced loads can
invalidate the entry allocated by load X. For example, the following loop successfully
targets rotating registers with two advanced loads without any ALAT misses because
the two advanced load – check load pairs never create more than 32 simultaneously
live ALAT entries:
L1:
(p16)
ld4.a
r32 = [r8]
(p31) ld4.c
r47 = [r8]
(p16) ld4.a
r48 = [r9]
(p31) ld4.c
r63 = [r9]
br.ctop L1;;
When the code cannot be arranged to avoid ALAT misses, it may be best to assign static
registers to the destinations of the advanced loads and unroll the loop to explicitly
rename the destinations of the advanced loads where necessary. The following
example shows how to unroll the loop to avoid the use of rotating registers. The loop
has an II equal to 1 and the check load is executed one cycle (and one rotation) after
the advanced load:
L1:
(p16)
ld4.a
r33 = [r8]
(p17)
ld4.c
r34 = [r8]
br.ctop L1;;
Static registers can be assigned to the destinations of the loads if the loop is unrolled
twice:
L1:
(p16)
ld4.a
r3 = [r8]
(p17) ld4.c
r4 = [r8]
br.cexit L2;;
(p16) ld4.a
r4 = [r8]
(p17) ld4.c
r3 = [r8]
br.ctop L1;;
L2: //
Rotating registers could still be used for the values that are not generated by advanced
loads. The effect of this unrolling on instruction cache performance must be considered
as part of the cost of advancing a load.
Содержание ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Страница 1: ......
Страница 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 12: ...1 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I Application Architecture Guide ...
Страница 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Страница 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Страница 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Страница 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Страница 230: ......
Страница 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 249: ...2 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I System Architecture Guide ...
Страница 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Страница 380: ...2 132 Volume 2 Part 1 Interruptions ...
Страница 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Страница 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Страница 749: ...2 501 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part II System Programmer s Guide ...
Страница 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Страница 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Страница 808: ...2 560 Volume 2 Part 2 Context Management ...
Страница 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Страница 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Страница 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Страница 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Страница 891: ......
Страница 941: ...3 42 Volume 3 Instruction Reference cmp illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Operation fault ...
Страница 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Страница 1191: ...3 292 Volume 3 Pseudo Code Functions Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Страница 1296: ......
Страница 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Страница 1564: ...4 262 Volume 4 Base IA 32 Instruction Reference LES Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1565: ...Volume 4 Base IA 32 Instruction Reference 4 263 LFS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1568: ...4 266 Volume 4 Base IA 32 Instruction Reference LGS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1583: ...Volume 4 Base IA 32 Instruction Reference 4 281 LSS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Страница 1663: ...Volume 4 Base IA 32 Instruction Reference 4 361 SHL SHR Shift Instructions See entry for SAL SAR SHL SHR ...
Страница 1668: ...4 366 Volume 4 Base IA 32 Instruction Reference SIDT Store Interrupt Descriptor Table Register See entry for SGDT SIDT ...
Страница 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Страница 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Страница 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...