![Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3 Manual Download Page 181](http://html.mh-extra.com/html/intel/itanium-architecture-software-developers-volume-3-rev-2-3/itanium-architecture-software-developers-volume-3-rev-2-3_manual_2073404181.webp)
1:170
Volume 1, Part 2: Predication, Control Flow, and Instruction Stream
4.2.4.4
Case 3
Suppose the if-clause is executed 30% of the time and the branch mispredicts 30% of
the time. The average number of clocks for:
• Unpredicated code is:
(2 cycles * 30%) + (18 cycles * 70%) + (10 cycles * 30%) = 16.2 clocks
• Predicated code is:
(5 cycles * 30%) + (18 cycles * 70%) = 14.1 clocks
In this case, if-conversion would
decrease
the execution cost by more than two clocks,
on average.
4.2.4.5
Overlapping Resource Usage
Before performing if-conversion, the programmer must consider the execution
resources consumed by predicated blocks in addition to considering flow-dependency
height. The
resource availability height
of a set of instructions is the minimum number
of cycles taken considering only the execution resources required to execute them.
The code below is derived from an if-then-else statement. Given the generic machine
model that has only two load/store (M) units. If a compiler predicates and combines
these two blocks, then the resource availability height through the block will be four
clocks since that is the minimum amount of time necessary to issue eight memory
operations:
then_clause:
ld
r1=[r21]
// Cycle 0
ld
r2=[r22]
// Cycle 0
st
[r32]=r3
// Cycle 1
st
[r33]=r4 ;; // Cycle 1
br
end_if
else_clause:
ld
r3=[r23]
// Cycle 0
ld
r4=[r24]
// Cycle 0
st
[r34]=r5
// Cycle 1
st
[r35]=r6 ;; // Cycle 1
end_if:
As with the example in the previous section
,
assuming various misprediction rates and
taken branch penalties changes the decision as to when to predicate and when not to
predicate. One case is illustrated below.
4.2.4.6
Case 1
Suppose the branch condition mispredicts 10% of the time and that the predicated code
takes four clocks to execute. The average number of clocks for:
• Non-predicated code is: (10 cycles * 10%) + 2 cycles = 3 cycles
• Predicated code is: 4 cycles
Predicating this code would
increase
execution time even though the flow dependency
heights of the branch paths are equal.
Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Page 1: ......
Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Page 230: ......
Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Page 380: ...2 132 Volume 2 Part 1 Interruptions ...
Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Page 808: ...2 560 Volume 2 Part 2 Context Management ...
Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Page 891: ......
Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Page 1296: ......
Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...