1:166
Volume 1, Part 2: Predication, Control Flow, and Instruction Stream
The process of predicating instructions in conditional blocks and removing branches is
referred to as
if-conversion
. Once if-conversion has been performed, instructions can
be scheduled more freely because there are fewer branches to limit code motion, and
there are fewer branches competing for issue slots.
In addition to removing branches, this transformation will make dynamic instruction
fetching more efficient since there are fewer possibilities for control flow changes.
Under more complex circumstances, several branches can be removed. The following C
code sequence:
if (r1)
r2 = r3 + r4;
else
r7 = r6 - r5;
can be rewritten in Itanium architecture-based assembly code without branches as:
cmp.ne p1,p2 = r1,0;;
(p1)
add
r2 = r3,r4
(p2)
sub
r7 = r6,r5
Since instructions from opposite sides of the conditional are predicated with
complementary predicates they are guaranteed not to conflict, hence the compiler has
more freedom when scheduling to make the best use of hardware resources. The
compiler could also try to schedule these statements with earlier or later code since
several branches and labels have been removed as part of if-conversion.
Since the branches have been removed, no branch misprediction is possible and there
will be no pipeline bubbles due to taken branches. Such effects are significant in many
large applications, and these transformations can greatly reduce branch-induced stalls
or flushes in the pipeline.
Thus, comparing the cost of the code above with the non-predicated version above
shows that:
• Non-predicated code consumes: 2 (30% * 10 cycles) = 5 cycles.
• Predicated code consumes: 2 cycles.
In this case, predication saves an average of three cycles.
4.2.3.2
Off-path Predication
If a compiler has dynamic profile information, it is possible to form an instruction
schedule based on the control flow path that is most likely to execute – this path is
called the main trace. In some cases, execution paths not on the main trace are still
executed frequently, and thus it may be beneficial to use predication to minimize their
critical paths as well.
The main trace of a flow graph is highlighted in
. Although blocks A and B are
not on the main trace, suppose they are executed a significant number of times.
Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Page 1: ......
Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Page 230: ......
Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Page 380: ...2 132 Volume 2 Part 1 Interruptions ...
Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Page 808: ...2 560 Volume 2 Part 2 Context Management ...
Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Page 891: ......
Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Page 1296: ......
Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...