
1:202
Volume 1, Part 2: Software Pipelining and Loop Support
Note that, in the code above, the
ld4
and the
add
instructions in stage 2 have been
reordered. Register rotation has been used to eliminate the WAR register dependency
from the
add
to the
ld4
. The first two stages are speculative. The code to implement
the pipeline is shown below:
ld4
r36 = [r5]
mov
ec = 2
mov
pr.rot = 1 << 16 ;;
// PR16 = 1, rest = 0
L1:
ld4.s
r32 = [r8],4
// Cycle 0
ld4.s
r34 = [r9],4
// Cycle 0
(p18)
and
r40 = 3,r39 ;;
// Cycle 0
ld4.s
r36 = [r35]
// Cycle 1
add
r38 = r37,r33
// Cycle 1
(p18)
chk.s
r40, recovery
// Cycle 1
(p18)
cmp.ne
p17,p0 = r40,r11
// Cycle 1
(p17)
br.wtop
L1 ;;
// Cycle 1
The problem with this pipelined loop is that the value written to
r36
prior to the loop is
overwritten before it is used by the
add
. The value is overwritten by the load into
r36
in the first kernel iteration. This load is in the second stage of the pipeline, but cannot
be controlled during the first kernel iteration because it is speculative and does not
have a stage predicate. This problem can be solved by peeling off one iteration of the
kernel and excluding from that copy any instructions that are not in the first stage of
the pipeline as shown below.
Note that the destination register numbers for the instructions in the explicit prolog
have been increased by one. This is to account for the fact that there is no rotation at
the end of the peeled kernel iteration.
ld4
r37 = [r5]
mov
ec = 1
mov
pr.rot = 1<<17;;
// PR17 = 1, rest = 0
ld4
r33 = [r8],4
ld4
r35 = [r9],4
L1:
ld4.s
r32 = [r8],4
// Cycle 0
ld4.s
r34 = [r9],4
// Cycle 0
(p18)
and
r40 = 3,r39 ;;
// Cycle 0
ld4.s
r36 = [r35]
// Cycle 1
add
r38 = r37,r33
// Cycle 1
(p18)
chk.s
r40, recovery
// Cycle 1
(p18)
cmp.ne
p17,p0 = r40,r11
// Cycle 1
(p17)
br.wtop
L1 ;;
// Cycle 1
In some cases, higher performance can be achieved by generating separate blocks of
code for all or part of the prolog and/or epilog phase. It is clear from the execution
trace of the pipelined counted loop from
that the functional units are
Содержание ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Страница 1: ......
Страница 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 12: ...1 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I Application Architecture Guide ...
Страница 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Страница 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Страница 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Страница 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Страница 230: ......
Страница 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 249: ...2 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I System Architecture Guide ...
Страница 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Страница 380: ...2 132 Volume 2 Part 1 Interruptions ...
Страница 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Страница 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Страница 749: ...2 501 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part II System Programmer s Guide ...
Страница 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Страница 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Страница 808: ...2 560 Volume 2 Part 2 Context Management ...
Страница 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Страница 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Страница 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Страница 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Страница 891: ......
Страница 941: ...3 42 Volume 3 Instruction Reference cmp illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Operation fault ...
Страница 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Страница 1191: ...3 292 Volume 3 Pseudo Code Functions Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Страница 1296: ......
Страница 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Страница 1564: ...4 262 Volume 4 Base IA 32 Instruction Reference LES Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1565: ...Volume 4 Base IA 32 Instruction Reference 4 263 LFS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1568: ...4 266 Volume 4 Base IA 32 Instruction Reference LGS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1583: ...Volume 4 Base IA 32 Instruction Reference 4 281 LSS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Страница 1663: ...Volume 4 Base IA 32 Instruction Reference 4 361 SHL SHR Shift Instructions See entry for SAL SAR SHL SHR ...
Страница 1668: ...4 366 Volume 4 Base IA 32 Instruction Reference SIDT Store Interrupt Descriptor Table Register See entry for SGDT SIDT ...
Страница 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Страница 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Страница 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...