Volume 1, Part 2: Introduction to Programming for the Intel
®
Itanium
®
Architecture
1:145
• Using predication to reduce the number of branches in the code. This improves
instruction fetching because there are fewer control flow changes, decreases the
number of branch mispredicts since there are fewer branches, and it increases the
branch prediction hit rates since there is less competition for prediction resources.
• Providing software hints for branches to improve hardware use of prediction and
prefetching resources.
• Supplying explicit support for software pipelining of loops and exit prediction of
counted loops.
2.7.1
Branch Instructions
Branching in the Itanium architecture is largely expressed the same way as on other
microprocessors. The major difference is that branch triggers are controlled by
predicates rather than conditions encoded in branch instructions. The architecture also
provides a rich set of hints to control branch prediction strategy, prefetching, and
specific branch types like loops, exits, and branches associated with software pipelining.
Targets for indirect branches are placed in branch registers prior to branch instructions.
2.7.2
Loops and Software Pipelining
Compilers sometimes try to improve the performance of loops by using unrolling.
However, unrolling is not effective on all loops for the following reasons:
• Unrolling may not fully exploit the parallelism available.
• Unrolling is tailored for a statically defined number of loop iterations.
• Unrolling can increase code size.
To maintain the advantages of loop unrolling while overcoming these limitations, the
Itanium architecture provides architectural support for software pipelining. Software
pipelining enables the compiler to interleave the execution of several loop iterations
without having to unroll a loop. Software pipelining is performed using:
• Loop-branch instructions.
•
LC
and
EC
application registers.
• Rotating registers and loop stage predicates.
• Branch hints that can assign a special prediction mechanism to important branches.
In addition to software pipelined
while
and
counted
loops, the architecture provides
particular support for simple counted loops using the
br.cloop
instruction. The
cloop
branch instruction uses the 64-bit Loop Count (
LC
) application register rather than a
qualifying predicate to determine the branch exit condition.
For a complete discussion of software pipelining support, see
2.7.3
Rotating Registers
Rotating registers enable succinct implementation of software pipelining with
predication. Rotating registers are rotated by one register position each time one of
the special loop branches is executed. Thus, after one rotation, the content of register
X
will be found in register
X+1
and the value of the highest numbered rotating register
Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Page 1: ......
Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Page 230: ......
Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Page 380: ...2 132 Volume 2 Part 1 Interruptions ...
Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Page 808: ...2 560 Volume 2 Part 2 Context Management ...
Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Page 891: ......
Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Page 1296: ......
Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...