Prefetch Unit
ARM DDI 0363G
Copyright © 2006-2011 ARM Limited. All rights reserved.
5-3
ID073015
Non-Confidential
5.2
Branch prediction
The PFU normally fetches instructions from sequential addresses. If a branch instruction is
fetched, the next instruction to be fetched can only be determined with certainty after the
instruction has completed execution at the end of the pipeline in the DPU. If the branch is taken,
the next instruction to be executed is not sequential. The sequential instructions that the PFU
has fetched while the branch instruction was executing must be flushed from the pipeline and
the correct instruction fetched. This has the effect of reducing the performance of the processor.
The PFU can detect branches in the Pd-stage of the pipeline, predict whether or not the branch
is taken, and determine or predict the target address for a taken branch. This enables the PFU to
start fetching instructions at the destination of a taken branch before the branch has completed
execution in the DPU. The branch instruction is still executed in the DPU to determine the
accuracy of the prediction. If the branch was mispredicted, the pipeline must be flushed and the
correct instruction fetched. In general, more branches are correctly predicted than mispredicted
so fewer pipeline flushes occur and the performance of the processor is enhanced.
Two major classes of branch are addressed in the processor prediction scheme:
1.
Direct branches, including
B
,
BL
,
CZB
, and
BLX
immediate, where the target address is a
fixed offset, encoded in the instruction, from the program counter. If such an instruction
is fetched, and the program counter is known, predicting the destination of the branch only
involves predicting whether the instruction passes or fails its condition code, that is,
whether the branch is taken or not taken.
2.
Indirect branches such as load and
Branch and eXchange
(
BX
), instructions that write to
the PC, that can be identified as a likely return from a procedure call. Two identifiable
cases are:
•
loads to the PC from an address derived from R13
•
BX
from R0-R14.
In these cases, if the calling operation can also be identified, the likely return address can
be stored in the return stack. Typical calling operations are
BL
and
BLX
instructions.
Note
Unconditional instructions of either class of program flow are always executed, and do not
affect prediction history. Unconditional return stack operations always affect the return stack.
This section describes:
•
•
Incorrect predictions and correction
.
5.2.1
Branch predictor
Branch prediction in the processor is dynamic and is based around a global history prediction
scheme. In addition, there is extra logic to handle predictions that thrash and to predict the end
of long loops.
The global history scheme is an adaptive predictor that learns the behavior of branches during
execution, identifying them based on the historical pattern of behavior of the preceding
branches. For each pattern of branch behavior, the history table holds a 2-bit hint value. The
2-bit hint indicates if the next branch must be predicted taken or predicted not-taken based on
the behavior of previous branches. The history table contains 256 entries.