1:198
Volume 1, Part 2: Software Pipelining and Loop Support
5.5.6
Loop Unrolling Prior to Software Pipelining
In some cases, higher performance can be achieved by unrolling the loop prior to
software pipelining. Loops that are resource constrained can be improved by unrolling
such that the limiting resource is more fully utilized. In the following example if we
assume the target processor has only two memory units, the loop performance is
bound by the number of memory units:
L1:
ld4
r4 = [r5],4
// Cycle 0
ld4
r9 = [r8],4;;
// Cycle 0
add
r7 = r4,r9;;
// Cycle 2
st4
[r6] = r7,4
// Cycle 3
br.cloop
L1;;
// Cycle 3
A pipelined version of this loop must have an II of at least two because there are three
memory instructions, but only two memory units. If the loop is unrolled twice prior to
software pipelining and assuming the store is independent of the loads, an II of 3 can
be achieved for the new loop. This is an effective II of 1.5 for the original source loop.
Below is a possible pipeline for the unrolled loop:
stage 1:
(p16)
ld4
r4 = [r5],8
// odd iteration
(p16)
ld4
r9 = [r8],8;;
// odd iteration
stage 2:
(p16)
ld4
r14 = [r15],8
// even iteration
(p16)
ld4
r19 = [r18],8;;
// even iteration
// ---
empty cycle
stage 3:(p18) add r7 = r4,r9
// odd iteration
(p17)
add
r17 = r14,r19;;
// even iteration
stage 4:
// ---
empty cycle
(p19)
st4
[r6] = r7,8
// odd iteration
(p18)
st4
[r16] = r17,8;;
// even iteration
The unrolled loop contains two copies of the source loop body, one that corresponds to
the odd source iterations and one that corresponds to the even source iterations. The
assignment of stage predicates must take this into account. Recall that each one
written to
p16
sequentially enables all the stages for a new source iteration. During
stage one of the above pipeline, the stage predicate for the odd iteration is in
p16
. The
stage predicate for the even iteration does not exist yet. During stage two of the above
pipeline, the stage predicate for the odd iteration is in
p17
and the new stage predicate
for the even iteration is in
p16
. Thus within the same pipeline stage, if the stage
Содержание ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Страница 1: ......
Страница 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 12: ...1 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I Application Architecture Guide ...
Страница 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Страница 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Страница 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Страница 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Страница 230: ......
Страница 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 249: ...2 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I System Architecture Guide ...
Страница 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Страница 380: ...2 132 Volume 2 Part 1 Interruptions ...
Страница 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Страница 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Страница 749: ...2 501 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part II System Programmer s Guide ...
Страница 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Страница 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Страница 808: ...2 560 Volume 2 Part 2 Context Management ...
Страница 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Страница 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Страница 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Страница 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Страница 891: ......
Страница 941: ...3 42 Volume 3 Instruction Reference cmp illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Operation fault ...
Страница 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Страница 1191: ...3 292 Volume 3 Pseudo Code Functions Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Страница 1296: ......
Страница 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Страница 1564: ...4 262 Volume 4 Base IA 32 Instruction Reference LES Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1565: ...Volume 4 Base IA 32 Instruction Reference 4 263 LFS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1568: ...4 266 Volume 4 Base IA 32 Instruction Reference LGS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1583: ...Volume 4 Base IA 32 Instruction Reference 4 281 LSS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Страница 1663: ...Volume 4 Base IA 32 Instruction Reference 4 361 SHL SHR Shift Instructions See entry for SAL SAR SHL SHR ...
Страница 1668: ...4 366 Volume 4 Base IA 32 Instruction Reference SIDT Store Interrupt Descriptor Table Register See entry for SGDT SIDT ...
Страница 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Страница 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Страница 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...