Volume 1, Part 2: Software Pipelining and Loop Support
1:183
utilization can be increased by unrolling the loop more times, but at the cost of further
code expansion. The loop below is unrolled four times (assuming the trip count is
multiple of four):
add
r15 = 4,r5
add
r25 = 8,r5
add
r35 = 12,r5
add
r16 = 4,r6
add
r26 = 8,r6
add
r36 = 12,r6;;
L1:
ld4
r4 = [r5],16
// Cycle 0
ld4
r14 = [r15],16;;
// Cycle 0
ld4
r24 = [r25],16
// Cycle 1
ld4
r34 = [r35],16;;
// Cycle 1
add
r7 = r4,r9
// Cycle 2
add
r17 = r14,r9;;
// Cycle 2
st4
[r6] = r7,16
// Cycle 3
st4
[r16] = r17,16
// Cycle 3
add
r27 = r24,r9
// Cycle 3
add
r37 = r34,r9;;
// Cycle 3
st4
[r26] = r27,16
// Cycle 4
st4
[r36] = r37,16
// Cycle 4
br.cloop L1;;
// Cycle 4
The two memory ports are now utilized in every cycle except cycle 2. Four iterations are
now executed in five cycles verses the two iterations in four cycles for the previous
version of the loop.
5.3.2
Software Pipelining
Software pipelining is a technique that seeks to overlap loop iterations in a manner that
is analogous to hardware pipelining of a functional unit. Each iteration is partitioned into
stages with zero or more instructions in each stage. A conceptual view of a single
pipelined iteration of the loop from
in which each stage is one cycle long is
shown below:
stage 1:ld4 r4 = [r5],4
stage 2:---
// empty stage
stage 3:add r7 = r4,r9
stage 4:st4 [r6] = r7,4
The following is a conceptual view of five pipelined iterations:
1 2 3 4 5 Cycle
----------------------------------------------------
ld4
X
ld4
X+1
add
ld4
X+2
st4 add
ld4
X+3
st4 add ld4
X+4
st4 add
X+5
st4 add
X+6
st4
X+7
The number of cycles between the start of successive iterations is called the initiation
interval (II). In the above example, the II is one. Each stage of a pipelined iteration is II
cycles long. Most of the examples in this chapter utilize modulo scheduling, which is a
particular form of software pipelining in which the II is a constant and every iteration of
Содержание ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Страница 1: ......
Страница 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 12: ...1 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I Application Architecture Guide ...
Страница 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Страница 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Страница 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Страница 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Страница 230: ......
Страница 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 249: ...2 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I System Architecture Guide ...
Страница 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Страница 380: ...2 132 Volume 2 Part 1 Interruptions ...
Страница 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Страница 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Страница 749: ...2 501 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part II System Programmer s Guide ...
Страница 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Страница 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Страница 808: ...2 560 Volume 2 Part 2 Context Management ...
Страница 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Страница 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Страница 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Страница 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Страница 891: ......
Страница 941: ...3 42 Volume 3 Instruction Reference cmp illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Operation fault ...
Страница 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Страница 1191: ...3 292 Volume 3 Pseudo Code Functions Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Страница 1296: ......
Страница 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Страница 1564: ...4 262 Volume 4 Base IA 32 Instruction Reference LES Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1565: ...Volume 4 Base IA 32 Instruction Reference 4 263 LFS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1568: ...4 266 Volume 4 Base IA 32 Instruction Reference LGS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1583: ...Volume 4 Base IA 32 Instruction Reference 4 281 LSS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Страница 1663: ...Volume 4 Base IA 32 Instruction Reference 4 361 SHL SHR Shift Instructions See entry for SAL SAR SHL SHR ...
Страница 1668: ...4 366 Volume 4 Base IA 32 Instruction Reference SIDT Store Interrupt Descriptor Table Register See entry for SGDT SIDT ...
Страница 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Страница 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Страница 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...