4-14
Intel® PXA27x Processor Family
Optimization Guide
Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
4.3.1.5
Scheduling Load and Store Multiple (LDM/STM)
LDM and STM instructions have an issue latency of 2 to 20 cycles depending on the number of
registers being loaded or stored. The issue latency is typically two cycles plus an additional cycle
for each of the registers loaded or stored assuming a data cache hit. The instruction following an
LDM stalls whether or not this instruction depends on the results of the load. An LDRD or STRD
instruction does not suffer from this drawback (except when followed by a memory operation) and
should be used where possible. Consider the task of adding two 64-bit integer values. Assume that
the addresses of these values are aligned on an 8-byte boundary. Achieve this using the following
LDM instructions.
; r0 contains the address of the value being copied
; r1 contains the address of the destination location
ldm r0, {r2, r3}
ldm r1, {r4, r5}
adds r0, r2, r4
adc r1,r3, r5
Assuming all accesses hit the cache, this example code takes 11 cycles to complete. Rewriting the
code as shown in the following example using the LDRD instruction would take only seven cycles
to complete. The performance increases further if users fill in other instructions after the LDRD
instruction to reduce the stalls due to the result latencies of the LDRD instructions and the one
cycle stall of any memory operation.
; r0 contains the address of the value being copied
; r1 contains the address of the destination location
ldrd r2, [r0]
ldrd r4, [r1]
adds r0, r2, r4
adc r1,r3, r5
Similarly, the code sequence in the following example takes five cycles to complete.
stm r0, {r2, r3}
add r1, r1, #1
The alternative version which is shown below would only take 3 cycles to complete.
strd r2, [r0]
add r1, r1, #1
Содержание PXA270
Страница 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Страница 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Страница 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Страница 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Страница 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Страница 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Страница 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Страница 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Страница 144: ......