Program Flow Prediction
ARM DDI 0301H
Copyright © 2004-2009 ARM Limited. All rights reserved.
5-2
ID012310
Non-Confidential, Unrestricted Access
5.1
About program flow prediction
Program flow prediction in the processor is carried out by:
The integer core
Implements static branch prediction and the Return Stack.
The Prefetch Unit
The PU implements dynamic branch prediction.
The processor is responsible for handling branches the first time they are executed, that is, when
no historical information is available for dynamic prediction by the PU.
The integer core makes static predictions about the likely outcome of a branch early in its
pipeline and then resolves those predictions when the outcome of conditional execution is
known. Condition codes are evaluated at three points in the integer core pipeline, and branches
are resolved as soon as the flags are guaranteed not to be modified by a preceding instruction.
When a branch is resolved, the integer core passes information to the PU so that it can make a
Branch Target Address Cache
(BTAC) allocation or update an existing entry as appropriate. The
integer core is also responsible for identifying likely procedure calls and returns to predict the
returns. It can handle nested procedures up to three deep.
The integer core includes:
•
a
Static Branch Predictor
(SBP)
•
a
Return Stack
(RS)
•
branch resolution logic
•
a BTAC update interface to the PU
•
a BTAC allocate interface to the PU.
The processor PU is responsible for fetching instructions from the memory system as required
by the integer core, and coprocessors. The PU buffers up to seven instructions in its FIFO to:
•
detect branch instructions ahead of the integer core requirement
•
dynamically predict those that it considers are to be taken
•
provide branch folding of predicted branches if possible
•
identify unconditional procedure return instructions.
This reduces the cycle time of the branch instructions, so increasing processor performance.
The PU includes:
•
a BTAC
•
branch update and allocate logic
•
a
Dynamic Branch Predictor
(DBP), and associated update mechanism
•
branch folding logic.
It is responsible for providing the integer core with instructions, and for requesting cache
accesses. The pattern of cache accesses is based on the predicted instruction stream as
determined by the dynamic branch prediction mechanism or the integer core flush mechanism.
The BTAC can:
•
be globally flushed by a CP15 instruction
•
have individual entries flushed by a CP15 instruction
•
be enabled or disabled by a CP15 instruction.
For details of CP15 instructions see
c7, Cache operations
on page 3-69 and
Flush operations
on
page 3-79.
The BTAC is globally flushed for:
•
Main TLB FCSE PID changes