![Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3 Manual Download Page 186](http://html.mh-extra.com/html/intel/itanium-architecture-software-developers-volume-3-rev-2-3/itanium-architecture-software-developers-volume-3-rev-2-3_manual_2073404186.webp)
Volume 1, Part 2: Predication, Control Flow, and Instruction Stream
1:175
The Itanium architecture allows multiple instructions to target the same register in the
same clock provided that only one of the instructions writing the target register is
predicated true in that clock. Similar capabilities exist for writing predicate registers, as
discussed in
.
4.3.3.2
Reducing Register Usage
In some instances it is possible to use the same register for two separate computations
in the presence of predication. This technique is similar to the technique for allowing
multiple writers to store a value into the same register, although it is a register
allocation optimization rather than a critical path issue.
After if-conversion, it is particularly common for sequences of instructions to be
predicated with complementary predicates. The contrived sequence below shows
instructions predicated by
p1
and
p2
, which are known by the compiler to be
complementary:
(p1)
add
r1
=r2,r3
(p2)
sub
r5
=r4,r56
(p1)
ld8
r7
=[r2]
(p2)
ld8
r9
=[r6];;
(p1)
a use of r1
(p2)
a use of r5
(p1)
a use of r7
(p2)
a use of r9
Assuming registers
r1
,
r5
,
r7
, and
r9
are used for compiler temporaries, each of which
is live only until its next use, the preceding code segment can be rewritten as:
(p1)
add
r1
=r2,r3
(p2)
sub
r1
=r4,r56
// Reuse r1
(p1)
ld8
r7
=[r2]
(p2)
ld8
r7
=[r6];;
// Reuse r7
(p1)
a use of r1
(p2)
a use of r1
(p1)
a use of r7
(p2)
a use of r7
The new sequence uses two fewer registers. With the 128 registers defined in the
architecture, this may not seem essential, but reducing register use can still reduce
program and register stack engine spills and fills that can be common in codes with
high instruction-level parallelism.
4.3.4
Improving Instruction Stream Fetching
Instructions flow through the pipeline most efficiently when they are executed in large
blocks with no taken branches. Whenever the instruction pointer needs to be changed,
the hardware may have to insert bubbles into the pipeline either while the target
prediction is taking place or because the target address is not computed until later in
the pipeline.
Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Page 1: ......
Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Page 230: ......
Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Page 380: ...2 132 Volume 2 Part 1 Interruptions ...
Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Page 808: ...2 560 Volume 2 Part 2 Context Management ...
Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Page 891: ......
Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Page 1296: ......
Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...