4-14
Intel® PXA27x Processor Family
Optimization Guide
Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
4.3.1.5
Scheduling Load and Store Multiple (LDM/STM)
LDM and STM instructions have an issue latency of 2 to 20 cycles depending on the number of
registers being loaded or stored. The issue latency is typically two cycles plus an additional cycle
for each of the registers loaded or stored assuming a data cache hit. The instruction following an
LDM stalls whether or not this instruction depends on the results of the load. An LDRD or STRD
instruction does not suffer from this drawback (except when followed by a memory operation) and
should be used where possible. Consider the task of adding two 64-bit integer values. Assume that
the addresses of these values are aligned on an 8-byte boundary. Achieve this using the following
LDM instructions.
; r0 contains the address of the value being copied
; r1 contains the address of the destination location
ldm r0, {r2, r3}
ldm r1, {r4, r5}
adds r0, r2, r4
adc r1,r3, r5
Assuming all accesses hit the cache, this example code takes 11 cycles to complete. Rewriting the
code as shown in the following example using the LDRD instruction would take only seven cycles
to complete. The performance increases further if users fill in other instructions after the LDRD
instruction to reduce the stalls due to the result latencies of the LDRD instructions and the one
cycle stall of any memory operation.
; r0 contains the address of the value being copied
; r1 contains the address of the destination location
ldrd r2, [r0]
ldrd r4, [r1]
adds r0, r2, r4
adc r1,r3, r5
Similarly, the code sequence in the following example takes five cycles to complete.
stm r0, {r2, r3}
add r1, r1, #1
The alternative version which is shown below would only take 3 cycles to complete.
strd r2, [r0]
add r1, r1, #1
Summary of Contents for PXA270
Page 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Page 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Page 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Page 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Page 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Page 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Page 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Page 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Page 144: ......