User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
Instruction Timing
Page 216 of 377
gx_06.fm.(1.2)
March 27, 2006
The
750GX’s instruction-cache throttling feature, managed through the Instruction Cache Throttling Control
(ICTC) register, can lower the processor’s overall junction temperature by slowing the instruction fetch rate.
See Chapter 10, Power and Thermal Management, on page 335 for more information.
Branch instructions are identified by the fetcher, and forwarded to the BPU directly, bypassing the dispatch
queue. If the branch is unconditional or if the specified conditions are already known, the branch can be
resolved immediately. That is, the branch direction is known and instruction fetching can continue along the
correct path. Otherwise, the branch direction must be predicted. The 750GX offers several resources to aid in
the quick resolution of branch instructions and to improve the accuracy of branch predictions. These include:
Branch instructions that do not update the LR or CTR are removed from the instruction stream by branch
folding, as described in Section 6.4.1.1, Branch Folding, on page 226. Branch instructions that update the LR
or CTR are treated as if they require dispatch (even through they are not issued to an execution unit in the
process). They are assigned a position in the completion queue to ensure that the CTR and LR are updated
in the correct program order.
All other instructions are issued from the IQ0 and IQ1. The dispatch rate depends upon the availability of
resources such as the execution units, Rename Registers, and completion queue entries, and upon the seri-
alizing behavior of some instructions. Instructions are dispatched in program order; an instruction in IQ1
cannot be dispatched ahead of one in IQ0.
6.3.2 Instruction Fetch Timing
Instruction fetch latency depends on whether the fetch hits the BTIC, the L1 instruction cache, or the L2
cache. If no cache hit occurs, a memory transaction is required in which case fetch latency is affected by bus
traffic, bus clock speed, and memory translation. These issues are discussed further in the following sections.
Branch target
instruction cache
The 64-entry (4-way-associative) branch target instruction cache (BTIC) holds
branch target instructions so when a branch is encountered in a repeated loop,
usually the first two instructions in the target stream can be fetched into the instruc-
tion queue on the next clock cycle. The BTIC can be disabled and invalidated
through bits in Hardware-Implementation-Dependent Register 0 (HID0). Coher-
ency of the BTIC table is maintained by table reset on an instruction-cache flash
invalidate, Instruction Cache Block Invalidate (icbi) or Return from Interrupt (rfi)
instruction execution, or when an exception is taken.
Dynamic branch
prediction
The 512-entry branch history table (BHT) is implemented with two bits per entry for
four degrees of prediction—not-taken, strongly not-taken, taken, strongly taken.
Whether a branch instruction is taken or not-taken can change the strength of the
next prediction. This dynamic branch prediction is not defined by the PowerPC
Architecture.
To reduce aliasing, only predicted branches update the BHT entries. Dynamic
branch prediction is enabled by setting HID0[BHT]; otherwise, static branch predic-
tion is used.
Static branch prediction
Static branch prediction is defined by the PowerPC Architecture and is encoded in
the branch instructions. See Static Branch Prediction on page 229.