1:186
Volume 1, Part 2: Software Pipelining and Loop Support
for the same source iteration. Each one written to
p16
sequentially enables all the
stages for a new source iteration. This behavior is used to enable or disable the
execution of the stages of the pipelined loop during the prolog, kernel, and epilog
phases as described in the next section.
5.4.2
Note on Initializing Rotating Predicates
In this chapter, the instruction
mov pr.rot = immed
is used to initialize rotating
predicates. This instruction ignores the value of CFM.rrb.pr. Thus, the examples in this
chapter are written assuming that CFM.rrb.pr is always zero prior to the initialization of
predicate registers using
mov pr.rot = immed
.
5.4.3
Software-pipelined Loop Branches
The special software-pipelined loop branches allow the compiler to generate very
compact code for software-pipelined loops by supporting register rotation and by
controlling the filling and draining of the software pipeline during the prolog and epilog
phases. Generally speaking, each time a software-pipelined loop branch is executed,
the following actions take place:
1. A decision is made on whether or not to continue kernel loop execution.
2.
p16
is set to a value to control execution of the stages of the software pipeline
(
p63
is written by the branch, and after rotation this value will be in
p16
).
3. The registers are rotated (rrb registers are decremented).
4. The Loop Count (
LC
) and/or the Epilog Count (
EC
) application registers are
selectively decremented.
There are two types of software-pipelined loop branches: counted and while.
5.4.3.1
Counted Loop Branches
shows a flowchart for modulo-scheduled counted loop branches.
During the prolog and kernel phase, a decision to continue kernel loop execution means
that a new source iteration is started. Register rotation must occur so that the new
source iteration does not overwrite registers that are in use by prior source iterations
that are still in the pipeline.
p16
is set to 1 to enable the stages of the new source
iteration.
LC
is decremented to update the count of remaining source iterations.
EC
is
not modified.
During the epilog phase, the decision to continue loop execution means that the
software pipeline has not yet been fully drained and execution of the source iterations
in progress must continue. Register rotation must continue because the remaining
source iterations are still writing results and the consumers of the results expect
rotation to occur.
p16
is now set to 0 because there are no more new source iterations
and the instructions that correspond to non-existent source iterations must be disabled.
EC
contains the count of the remaining execution stages for the last source iteration
and is decremented during the epilog. For most loops, when a software pipelined loop
branch is executed with
EC
equal to 1, it indicates that the pipeline has been drained
Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Page 1: ......
Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Page 230: ......
Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Page 380: ...2 132 Volume 2 Part 1 Interruptions ...
Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Page 808: ...2 560 Volume 2 Part 2 Context Management ...
Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Page 891: ......
Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Page 1296: ......
Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...