Volume 1, Part 2: Software Pipelining and Loop Support
1:185
Itanium architecture allow some loops to be software pipelined without code expansion.
Register rotation provides a renaming mechanism that reduces the need for loop
unrolling and software renaming of registers. Special software pipelined loop branches
support register rotation and, combined with predication, reduce the need to generate
separate blocks of code for the prolog and epilog phases.
5.4.1
Register Rotation
Register rotation renames registers by adding the register number to the value of a
register rename base (rrb) register contained in the CFM. The rrb register is
decremented when certain special software pipelined loop branches are executed at the
end of each kernel iteration. Decrementing the rrb register makes the value in register
X appear to move to register X+1. If X is the highest numbered rotating register, its
value wraps to the lowest numbered rotating register.
A fixed-sized area of the predicate and floating-point register files (
p16
-
p63
and
f32
-
f127
), and a programmable-sized area of the general register file are defined to
rotate. The size of the rotating area in the general register file is determined by an
immediate in the
alloc
instruction and must be either zero or a multiple of 8, up to a
maximum of 96 registers. The lowest numbered rotating register in the general register
file is
r32
. An rrb register is provided for each of the three rotating register files:
CFM.rrb.gr
for the general registers;
CFM.rrb.fr
for the floating-point registers;
CFM.rrb.pr
for the predicate registers. The software pipelined loop branches
decrement all the rrb registers simultaneously.
Below is an example of register rotation. The
swp_branch
pseudo-instruction
represents a software pipelined loop branch:
L1:
ld4
r35 = [r4],4
// post increment by 4
st4
[r5] = r37,4
// post increment by 4
swp_branchL1 ;;
The value that the load writes to
r35
is read by the store two kernel iterations (and two
rotations) later as
r37
. In the meantime, two more instances of the load are executed.
Because of register rotation, those instances write their result to different registers and
do not modify the value needed by the store.
The rotation of predicate registers serves two purposes. The first is to avoid
overwriting a predicate value that is still needed. The second purpose is to control the
filling and draining of the pipeline. To do this, a programmer assigns a predicate to each
stage of the software pipeline to control the execution of the instructions in that stage.
This predicate is called the
stage predicate
. For counted loops,
p16
is architecturally
defined to be the predicate for the first stage,
p17
is defined to be the predicate for the
second stage, etc. A conceptual view of a pipelined source iteration of the example
counted loop on
is shown below. Each stage is one cycle long and the
stage predicates are shown:
stage 1:(p16)
ld4 r4 = [r5],4
stage 2:(p17)
---
// empty stage
stage 3:(p18)
add r7 = r4,r9
stage 4:(p19)
st4 [r6] = r7,4
A register rotation takes place at the end of each stage (when the software-pipelined
loop branch is executed in the kernel loop). Thus a 1 written to
p16
enables the first
stage and then is rotated to
p17
at the end of the first stage to enable the second stage
Содержание ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Страница 1: ......
Страница 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 12: ...1 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I Application Architecture Guide ...
Страница 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Страница 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Страница 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Страница 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Страница 230: ......
Страница 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 249: ...2 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I System Architecture Guide ...
Страница 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Страница 380: ...2 132 Volume 2 Part 1 Interruptions ...
Страница 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Страница 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Страница 749: ...2 501 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part II System Programmer s Guide ...
Страница 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Страница 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Страница 808: ...2 560 Volume 2 Part 2 Context Management ...
Страница 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Страница 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Страница 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Страница 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Страница 891: ......
Страница 941: ...3 42 Volume 3 Instruction Reference cmp illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Operation fault ...
Страница 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Страница 1191: ...3 292 Volume 3 Pseudo Code Functions Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Страница 1296: ......
Страница 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Страница 1564: ...4 262 Volume 4 Base IA 32 Instruction Reference LES Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1565: ...Volume 4 Base IA 32 Instruction Reference 4 263 LFS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1568: ...4 266 Volume 4 Base IA 32 Instruction Reference LGS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1583: ...Volume 4 Base IA 32 Instruction Reference 4 281 LSS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Страница 1663: ...Volume 4 Base IA 32 Instruction Reference 4 361 SHL SHR Shift Instructions See entry for SAL SAR SHL SHR ...
Страница 1668: ...4 366 Volume 4 Base IA 32 Instruction Reference SIDT Store Interrupt Descriptor Table Register See entry for SGDT SIDT ...
Страница 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Страница 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Страница 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...