1:182
Volume 1, Part 2: Software Pipelining and Loop Support
This section describes two general methods for overlapping loop iterations, both of
which result in code expansion on traditional architectures. The code expansion
problem is addressed by loop support features in the Itanium architecture that are
explored later in this chapter. The loop above will be used as a running example in the
next few sections.
5.3.1
Loop Unrolling
Loop unrolling is a technique that seeks to increase the available instruction level
parallelism by making and scheduling multiple copies of the loop body together. The
registers in each copy of the loop body are given different names to avoid unnecessary
WAW and WAR data dependencies. The code below shows the loop from our example
on
after unrolling twice (total of two copies of the original loop body) and
instruction scheduling, assuming two memory ports and a two cycle latency for loads.
For simplicity, assume that the loop trip count is a constant N that is a multiple of two,
so that no exit branch is required after the first copy of the loop body:
L1:
ld4
r4 = [r5],4;;
// Cycle 0
ld4
r14 = [r5],4;;
// Cycle 1
add
r7 = r4,r9;;
// Cycle 2
add
r17 = r14,r9
// Cycle 3
st4
[r6] = r7,4;;
// Cycle 3
st4
[r6] = r17,4
// Cycle 4
br.cloopL1;;
// Cycle 4
The above code does not expose as much ILP as possible. The two loads are serialized
because they both use and update
r5
. Similarly the two stores both use and update
r6
.
A variable which is incremented (or decremented) once each iteration by the same
amount is called an induction variable. The single induction variable
r5
(and similarly
r6
) can be expanded into two registers as shown in the code below:
add
r15 = 4,r5
add
r16 = 4,r6;;
L1:
ld4
r4 = [r5],8
// Cycle 0
ld4
r14 = [r15],8;;
// Cycle 0
add
r7 = r4,r9
// Cycle 2
add
r17 = r14,r9;;
// Cycle 2
st4
[r6] r7,8
// Cycle 3
st4
[r16] = r17,8
// Cycle 3
br.cloopL1;;
// Cycle 3
Compared to the original loop on
, twice as many functional units are
utilized and the code size is twice as large. However, no instructions are issued in cycle
1 and the functional units are still under utilized in the remaining cycles. The
Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Page 1: ......
Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Page 230: ......
Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Page 380: ...2 132 Volume 2 Part 1 Interruptions ...
Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Page 808: ...2 560 Volume 2 Part 2 Context Management ...
Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Page 891: ......
Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Page 1296: ......
Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...