Volume 1, Part 2: Software Pipelining and Loop Support
1:191
Notice that the load for the second source iteration is executed before the compare and
branch of the first source iteration. That is, the load (and the update of
r5
) is
speculative. The loop condition is not computed until cycle X+2, but in order to
maximize the use of resources, it is desirable to start the second source iteration at
cycle X+1. Without the support for control speculation in the Itanium architecture, the
second source iteration could not be started until cycle X+3.
The computation of the loop condition for while loops is very different from that of
counted loops. In counted loops, it is possible to compute the loop condition in one
cycle using a counted loop branch. This is what a
br.ctop
instruction does when it sets
p16
. In while loops, a compare must compute the loop condition and set the stage
predicates. The stages prior to the one containing the compare are called the
speculative stages
of the pipeline, because it is not possible for the compare to
completely control the execution of these stages. Therefore, the stage predicate set by
the compare is used (after rotation) to control the first non-speculative stage of the
pipeline.
The pipelined version of the while loop on
is shown below. A check for the
speculative load is included:
mov
ec = 2
mov
pr.rot = 1 << 16;;
// PR16 = 1, rest = 0
L1:
ld4.s
r32 = [r5],4
// Cycle 0
(p18)
chk.s
r34, recovery
// Cycle 0
(p18)
cmp.ne p17,p0 = r34,r0
// Cycle 0
(p18)
st4
[r6] = r34,4
// Cycle 0
(p17)
br.wtop.sptkL1;;
// Cycle 0
L2:
To explain why the kernel loop is programmed the way it is, it is helpful to examine a
trace of the execution of the loop (assume there are 200 source iterations) shown in
.
There is no stage predicate assigned to the load because it is speculative. The compare
sets
p17
. This is the branch predicate for the current iteration and, after rotation, the
stage predicate for the first non-speculative stage (stage three) of the next source
iteration. During the prolog, the compare cannot produce its first valid result until cycle
two. The initialization of the predicates provides a pipeline that disables the compare
until the first source iteration reaches stage two in cycle two. At that point the
compare starts generating stage predicates to control the non-speculative stages of the
pipeline. Notice that the compare is conditional. If it were unconditional, it would
always write a zero to
p17
and the pipeline would not get started correctly.
Table 5-2.
wtop Loop Trace
Cycle
Port/Instructions
State before br.wtop
M
I
I
M
B
p16
p17
p18
EC
0
ld4.s
br.wtop
1
0
0
2
1
ld4.s
br.wtop
0
1
0
1
2
ld4.s
cmp
chk
st4
br.wtop
0
1
1
1
Содержание ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Страница 1: ......
Страница 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 12: ...1 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I Application Architecture Guide ...
Страница 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Страница 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Страница 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Страница 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Страница 230: ......
Страница 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 249: ...2 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I System Architecture Guide ...
Страница 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Страница 380: ...2 132 Volume 2 Part 1 Interruptions ...
Страница 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Страница 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Страница 749: ...2 501 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part II System Programmer s Guide ...
Страница 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Страница 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Страница 808: ...2 560 Volume 2 Part 2 Context Management ...
Страница 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Страница 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Страница 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Страница 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Страница 891: ......
Страница 941: ...3 42 Volume 3 Instruction Reference cmp illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Operation fault ...
Страница 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Страница 1191: ...3 292 Volume 3 Pseudo Code Functions Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Страница 1296: ......
Страница 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Страница 1564: ...4 262 Volume 4 Base IA 32 Instruction Reference LES Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1565: ...Volume 4 Base IA 32 Instruction Reference 4 263 LFS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1568: ...4 266 Volume 4 Base IA 32 Instruction Reference LGS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1583: ...Volume 4 Base IA 32 Instruction Reference 4 281 LSS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Страница 1663: ...Volume 4 Base IA 32 Instruction Reference 4 361 SHL SHR Shift Instructions See entry for SAL SAR SHL SHR ...
Страница 1668: ...4 366 Volume 4 Base IA 32 Instruction Reference SIDT Store Interrupt Descriptor Table Register See entry for SGDT SIDT ...
Страница 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Страница 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Страница 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...