Prefetch Unit
ARM DDI 0363E
Copyright © 2009 ARM Limited. All rights reserved.
5-3
ID013010
Non-Confidential, Unrestricted Access
5.2
Branch prediction
The PFU normally fetches instructions from sequential addresses. If a branch instruction is
fetched, the next instruction to be fetched can only be determined with certainty after the
instruction has completed execution at the end of the pipeline in the DPU. If the branch is taken,
the next instruction to be executed is not sequential. The sequential instructions that the PFU
has fetched while the branch instruction was executing must be flushed from the pipeline and
the correct instruction fetched. This has the effect of reducing the performance of the processor.
The PFU can detect branches in the Pd-stage of the pipeline, predict whether or not the branch
is taken, and determine or predict the target address for a taken branch. This enables the PFU to
start fetching instructions at the destination of a taken branch before the branch has completed
execution in the DPU. The branch instruction is still executed in the DPU to determine the
accuracy of the prediction. If the branch was mispredicted, the pipeline must be flushed and the
correct instruction fetched. In general, more branches are correctly predicted than mispredicted
so fewer pipeline flushes occur and the performance of the processor is enhanced.
Two major classes of branch are addressed in the processor prediction scheme:
1.
Direct branches, including
B
,
BL
,
CZB
, and
BLX
immediate, where the target address is a
fixed offset, encoded in the instruction, from the program counter. If such an instruction
has been fetched, and the program counter is known, predicting the destination of the
branch only involves predicting whether the instruction passes or fails its condition code,
that is, whether the branch is taken or not taken.
2.
Indirect branches such as load and
Branch and eXchange
(
BX
), instructions which write to
the PC, that can be identified as a likely return from a procedure call. Two identifiable
cases are:
•
loads to the PC from an address derived from R13
•
BX
from R0-R14.
In these cases, if the calling operation can also be identified, the likely return address can
be stored in the return stack. Typical calling operations are
BL
and
BLX
instructions.
Note
Unconditional instructions of either class of program flow are always executed, and do not
affect prediction history. Unconditional return stack operations always affect the return stack.
This section describes:
•
Disabling program flow prediction
•
Branch predictor
on page 5-4
•
Incorrect predictions and correction
on page 5-4.
5.2.1
Disabling program flow prediction
You cannot disable program flow prediction using the Z bit, bit [11], of CP15 Register c1. The
Z bit is tied to 1. To disable the program flow prediction you must disable the return stack and
set the branch prediction policy to not-taken. For more information see
c1, System Control
Register
on page 4-35.
You can also control the return stack, the branch predictor, and the fetch rate using the Auxiliary
Control Register. For more information see
Auxiliary Control Registers
on page 4-38.