Cycle Timings and Interlock Behavior
ARM DDI 0301H
Copyright © 2004-2009 ARM Limited. All rights reserved.
16-22
ID012310
Non-Confidential, Unrestricted Access
16.12.2 Load Multiples, where the PC is in the register list
If a LDM loads the PC then the PC access is performed first to accelerate the branch, followed
by the rest of the register loads. The cycle timings and all register load latencies for LDMs with
the pc in the list are one greater than the cycle times for the same LDM without the PC in the list.
The processor includes a three-entry return stack that can predict procedure returns. Any LDM
to the PC with the stack point, R13, as the base register, and that does not restore the SPSR to
the CPSR, is predicted as a procedure return.
For condition code failing cycle counts, the cycles for the non-PC destination variants must be
used. These are all single-cycle issue, consequently a condition code failing LDM to the PC
takes one cycle.
In all cases the base register, Rx, is an Early Reg, and requires an extra cycle of result latency to
provide its value.
Table 16-19 lists the cycle timing behavior of Load Multiples, where the PC is in the register list.
16.12.3 Example Interlocks
The following sequence that has an LDM instruction take five cycles, because R3 has a result
latency of four cycles:
LDMIA R0, {R1-R7}
ADD R10, R10, R3
The following that has an STM instruction takes five cycles to execute, because R6 has a register
lock latency of four cycles:
STMIA R0, {R1-R7}
ADD
R6, R10, R11
Table 16-19 Cycle timing behavior of Load Multiples, where the PC is in the register list
Example instruction
Cycle
s
Memory
Cycles
Result
latency
Comments
LDMIA sp!,{...,pc}
4
1+n
a
4,…
Correctly return stack predicted
LDMIA sp!,{...,pc}
9
1+n
a
4,…
Return stack mispredicted
LDMIA <cond> sp!,{...,pc}
9
1+n
a
4,…
Conditional return, or empty return stack
LDMIA rx,{...,pc}
8
1+n
a
4,…
Not return stack predicted
a. Where n is the number of memory cycles for this instruction if the pc had not been in the register list.