Cycle Timings and Interlock Behavior
ARM DDI 0301H
Copyright © 2004-2009 ARM Limited. All rights reserved.
16-3
ID012310
Non-Confidential, Unrestricted Access
predicted. A conditional return pops an entry from the return stack but is not predicted. If the
return stack is empty a return is not predicted. Items are placed on the return stack from the
following instructions:
•
BL #<immed>
•
BLX #<immed>
•
BLX Rx
Items are popped from the return stack by the following types of instruction:
•
BX lr
•
MOV pc, lr
•
LDR pc, [sp], #cns
•
LDMIA sp!, {….,pc}
A correctly predicted return stack pop takes four cycles.
16.1.2
Instruction execution overview
The instruction execution pipeline is constructed from three parallel four-stage pipelines. See
Table 16-1. For a complete description of these pipeline stages see
Pipeline stages
on page 1-26.
The ALU and multiply pipelines operate in a lock-step manner, causing all instructions in these
pipelines to retire in order. The load/store pipeline is a decoupled pipeline enabling subsequent
instructions in the ALU and multiply pipeline to complete underneath outstanding loads.
Extensive forwarding to the Sh, MAC1, ADD, ALU, MAC2, and DC1 stages enables many
dependent instruction sequences to run without pipeline stalls. General forwarding occurs from
the ALU, Sat, WBex and WBls pipeline stages. In addition, the multiplier contains an internal
multiply accumulate forwarding path. Most instructions do not require a register until the ALU
stage. All result latencies are given as the number of cycles until the register is required by a
following instruction in the ALU stage.
The following sequence takes four cycles:
LDR R1, [R2]
;Result latency three
ADD R3, R3, R1
;Register R1 required by ALU
If a subsequent instruction requires the register at the start of the Sh, MAC1, or ADD stage then
an extra cycle must be added to the result latency of the instruction producing the required
register. Instructions that require a register at the start of these stages are specified by describing
that register as an Early Reg. The following sequence, requiring an Early Reg, takes five cycles:
LDR R1, [R2]
;Result latency three plus one
ADD R3, R3, R1
LSL#6
;plus one because Register R1 is required by Sh
Table 16-1 Pipeline stages
Pipeline
Stages
ALU
Sh
ALU
Sat
WBex
Multiply
MAC1
MAC2
MAC3
Load/Store
ADD
DC1
DC2
WBls