Volume 1, Part 2: Software Pipelining and Loop Support
1:197
5.5.5.2
Conflicts in the ALAT
Using an advanced load to remove a likely invariant load from a loop while advancing
another load inside the loop results in poor performance if the latter load targets a
rotating register. The advanced load that targets the rotating register will eventually
invalidate the ALAT entry for the loop invariant load. Thereafter, every execution of the
check load for the loop invariant load will cause an ALAT miss.
When more than one advanced load in the loop targets a rotating register, the registers
must be assigned and the register lifetimes controlled so that the check load for a
particular advanced load X is executed before any of the other advanced loads can
invalidate the entry allocated by load X. For example, the following loop successfully
targets rotating registers with two advanced loads without any ALAT misses because
the two advanced load – check load pairs never create more than 32 simultaneously
live ALAT entries:
L1:
(p16)
ld4.a
r32 = [r8]
(p31) ld4.c
r47 = [r8]
(p16) ld4.a
r48 = [r9]
(p31) ld4.c
r63 = [r9]
br.ctop L1;;
When the code cannot be arranged to avoid ALAT misses, it may be best to assign static
registers to the destinations of the advanced loads and unroll the loop to explicitly
rename the destinations of the advanced loads where necessary. The following
example shows how to unroll the loop to avoid the use of rotating registers. The loop
has an II equal to 1 and the check load is executed one cycle (and one rotation) after
the advanced load:
L1:
(p16)
ld4.a
r33 = [r8]
(p17)
ld4.c
r34 = [r8]
br.ctop L1;;
Static registers can be assigned to the destinations of the loads if the loop is unrolled
twice:
L1:
(p16)
ld4.a
r3 = [r8]
(p17) ld4.c
r4 = [r8]
br.cexit L2;;
(p16) ld4.a
r4 = [r8]
(p17) ld4.c
r3 = [r8]
br.ctop L1;;
L2: //
Rotating registers could still be used for the values that are not generated by advanced
loads. The effect of this unrolling on instruction cache performance must be considered
as part of the cost of advancing a load.
Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Page 1: ......
Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Page 230: ......
Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Page 380: ...2 132 Volume 2 Part 1 Interruptions ...
Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Page 808: ...2 560 Volume 2 Part 2 Context Management ...
Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Page 891: ......
Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Page 1296: ......
Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...