
1:184
Volume 1, Part 2: Software Pipelining and Loop Support
the loop has the same schedule. It is likely that software pipelining algorithms other
than modulo scheduling could benefit from the loop support features. Therefore the
examples in this chapter are discussed in terms of software pipelining rather than
modulo scheduling.
Software pipelined loops have three phases: prolog, kernel, and epilog, as shown
below:
1 2 3 4 5 Phase
----------------------------------------------------
ld4
ld4
Prolog
add
ld4
----------------------------------------------------
st4 add
ld4 Kernel
st4 add
ld4
------------------------------------------------------
st4 add
st4 add Epilog
st4
During the prolog phase, a new loop iteration is started every II cycles (every cycle for
the above example) to fill the pipeline. During the first cycle of the prolog, stage 1 of
the first iteration executes. During the second cycle, stage 1 of the second iteration and
stage 2 of the first iteration execute, etc. By the start of the kernel phase, the pipeline
is full. Stage 1 of the fourth iteration, stage 2 of the third iteration, stage 3 of the
second iteration, and stage 4 of the first iteration execute. During the kernel phase, a
new loop iteration is started, and another is completed every II cycles. During the
epilog phase, no new iterations are started, but the iterations already in progress are
completed, draining the pipeline. In the above example, iterations 3-5 are completed
during the epilog phase.
The software pipeline is coded as a loop that is very different from the original source
code loop. To avoid confusion when discussing loops and loop iterations, we use the
term
source loop
and
source iteration
to refer back to the original source code loop, and
the term
kernel loop
and
kernel iteration
to refer to the loop that implements the
software pipeline.
In the above example, the load from the second source iteration is issued before result
of the first load is consumed. Thus, in many cases, loads from successive iterations of
the loop must target different registers to avoid overwriting existing live values. In
traditional architectures, this requires unrolling of the kernel loop and software
renaming of the registers, resulting in code expansion. Furthermore, in traditional
architectures, separate blocks of code are generated for the prolog, kernel, and epilog
phases, resulting in additional code expansion.
5.4
Loop Support Features in the Intel
®
Itanium
®
Architecture
The code expansion that results from loop optimizations (such as software pipelining
and loop unrolling) on traditional architectures can increase the number of instruction
cache misses, thus reducing overall performance. The loop support features in the
Содержание ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Страница 1: ......
Страница 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 12: ...1 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I Application Architecture Guide ...
Страница 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Страница 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Страница 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Страница 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Страница 230: ......
Страница 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 249: ...2 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I System Architecture Guide ...
Страница 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Страница 380: ...2 132 Volume 2 Part 1 Interruptions ...
Страница 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Страница 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Страница 749: ...2 501 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part II System Programmer s Guide ...
Страница 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Страница 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Страница 808: ...2 560 Volume 2 Part 2 Context Management ...
Страница 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Страница 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Страница 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Страница 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Страница 891: ......
Страница 941: ...3 42 Volume 3 Instruction Reference cmp illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Operation fault ...
Страница 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Страница 1191: ...3 292 Volume 3 Pseudo Code Functions Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Страница 1296: ......
Страница 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Страница 1564: ...4 262 Volume 4 Base IA 32 Instruction Reference LES Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1565: ...Volume 4 Base IA 32 Instruction Reference 4 263 LFS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1568: ...4 266 Volume 4 Base IA 32 Instruction Reference LGS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1583: ...Volume 4 Base IA 32 Instruction Reference 4 281 LSS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Страница 1663: ...Volume 4 Base IA 32 Instruction Reference 4 361 SHL SHR Shift Instructions See entry for SAL SAR SHL SHR ...
Страница 1668: ...4 366 Volume 4 Base IA 32 Instruction Reference SIDT Store Interrupt Descriptor Table Register See entry for SGDT SIDT ...
Страница 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Страница 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Страница 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...