Volume 1, Part 2: Software Pipelining and Loop Support
1:191
Notice that the load for the second source iteration is executed before the compare and
branch of the first source iteration. That is, the load (and the update of
r5
) is
speculative. The loop condition is not computed until cycle X+2, but in order to
maximize the use of resources, it is desirable to start the second source iteration at
cycle X+1. Without the support for control speculation in the Itanium architecture, the
second source iteration could not be started until cycle X+3.
The computation of the loop condition for while loops is very different from that of
counted loops. In counted loops, it is possible to compute the loop condition in one
cycle using a counted loop branch. This is what a
br.ctop
instruction does when it sets
p16
. In while loops, a compare must compute the loop condition and set the stage
predicates. The stages prior to the one containing the compare are called the
speculative stages
of the pipeline, because it is not possible for the compare to
completely control the execution of these stages. Therefore, the stage predicate set by
the compare is used (after rotation) to control the first non-speculative stage of the
pipeline.
The pipelined version of the while loop on
is shown below. A check for the
speculative load is included:
mov
ec = 2
mov
pr.rot = 1 << 16;;
// PR16 = 1, rest = 0
L1:
ld4.s
r32 = [r5],4
// Cycle 0
(p18)
chk.s
r34, recovery
// Cycle 0
(p18)
cmp.ne p17,p0 = r34,r0
// Cycle 0
(p18)
st4
[r6] = r34,4
// Cycle 0
(p17)
br.wtop.sptkL1;;
// Cycle 0
L2:
To explain why the kernel loop is programmed the way it is, it is helpful to examine a
trace of the execution of the loop (assume there are 200 source iterations) shown in
.
There is no stage predicate assigned to the load because it is speculative. The compare
sets
p17
. This is the branch predicate for the current iteration and, after rotation, the
stage predicate for the first non-speculative stage (stage three) of the next source
iteration. During the prolog, the compare cannot produce its first valid result until cycle
two. The initialization of the predicates provides a pipeline that disables the compare
until the first source iteration reaches stage two in cycle two. At that point the
compare starts generating stage predicates to control the non-speculative stages of the
pipeline. Notice that the compare is conditional. If it were unconditional, it would
always write a zero to
p17
and the pipeline would not get started correctly.
Table 5-2.
wtop Loop Trace
Cycle
Port/Instructions
State before br.wtop
M
I
I
M
B
p16
p17
p18
EC
0
ld4.s
br.wtop
1
0
0
2
1
ld4.s
br.wtop
0
1
0
1
2
ld4.s
cmp
chk
st4
br.wtop
0
1
1
1
Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Page 1: ......
Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Page 230: ......
Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Page 380: ...2 132 Volume 2 Part 1 Interruptions ...
Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Page 808: ...2 560 Volume 2 Part 2 Context Management ...
Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Page 891: ......
Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Page 1296: ......
Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...