Volume 1, Part 1: Introduction to the Intel
®
Itanium
®
Architecture
1:19
2.8
Branching
In addition to removing branches through the use of predication, several mechanisms
are provided to decrease the branch misprediction rate and the cost of the remaining
mispredicted branches. These mechanisms provide ways for the compiler to
communicate information about branch conditions to the processor.
Branch predict instructions are provided which can be used to communicate an early
indication of the target address and the location of the branch. The compiler will try to
indicate whether a branch should be predicted dynamically or statically. The processor
can use this information to initialize branch prediction structures, enabling good
prediction even the first time a branch is encountered. This is beneficial for
unconditional branches or in situations where the compiler has information about likely
branch behavior.
For indirect branches, a branch register is used to hold the target address. Branch
predict instructions provide an indication of which register will be used in situations
when the target address can be computed early. A branch predict instruction can also
signal that an indirect branch is a procedure return, enabling the efficient use of
call/return stack prediction structures.
Special loop-closing branches are provided to accelerate counted loops and
modulo-scheduled loops. These branches and their associated branch predict
instructions provide information that allows for perfect prediction of loop termination,
thereby eliminating costly mispredict penalties and a reduction of the loop overhead.
2.9
Register Rotation
Modulo scheduling of a loop is analogous to hardware pipelining of a functional unit
since the next iteration of the loop starts before the previous iteration has finished. The
iteration is split into stages similar to the stages of an execution pipeline. Modulo
scheduling allows the compiler to execute loop iterations in parallel rather than
sequentially. The concurrent execution of multiple iterations traditionally requires
unrolling of the loop and software renaming of registers. The Itanium architecture
allows the renaming of registers which provide every iteration with its own set of
registers, avoiding the need for unrolling. This kind of register renaming is called
register rotation. The result is that software pipelining can be applied to a much wider
variety of loops
–
both small as well as large with significantly reduced overhead.
2.10
Floating-point Architecture
The Itanium architecture defines a floating-point architecture with full IEEE support for
the single, double, and double-extended (80-bit) data types. Some extensions, such as
a fused multiply and add operation, minimum and maximum functions, and a register
file format with a larger range than the double-extended memory format, are also
included. 128 floating-point registers are defined. Of these, 96 registers are rotating
(not stacked) and can be used to modulo schedule loops compactly. Multiple
floating-point status registers are provided for speculation.
Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Page 1: ......
Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Page 230: ......
Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Page 380: ...2 132 Volume 2 Part 1 Interruptions ...
Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Page 808: ...2 560 Volume 2 Part 2 Context Management ...
Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Page 891: ......
Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Page 1296: ......
Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...