Intel® PXA27x Processor Family
Optimization Guide
4-13
Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
The number of write buffers limits the number of successive writes that can be issued before the
processor stalls. No more than eight uncoalesced store instructions can be issued. If the data caches
are using the write-allocate with writeback policy, then a load operation may cause stores to the
external memory if the read operation evicts a cache line that is dirty (modified). The number of
sequential stores may be limited by this fact.
4.3.1.4
Scheduling Load Double and Store Double (LDRD/STRD)
The Intel XScale® Microarchitecture introduces two new double word instructions: LDRD and
STRD. LDRD loads 64 bits of data from an effective address into two consecutive registers. STRD
stores 64 bits from two consecutive registers to an effective address. There are two important
restrictions on how these instructions are used:
•
The effective address must be aligned on an 8-byte boundary
•
The specified register must be even (r0, r2)
Using LDRD/STRD instead of LDM/STM to do the same thing is more efficient because
LDRD/STRD issues in only one or two clock cycle. LDM/STM issues in four clock cycles. Avoid
LDRDs targeting R12 because this incurs an extra cycle of issue latency.
The LDRD instruction has a result latency of three or four cycles depending on the destination
register being accessed (assuming the data being loaded is in the data cache).
add r6, r7, r8
sub r5, r6, r9
; The following ldrd instruction would load values
; into registers r0 and r1
ldrd r0, [r3]
orr r8, r1, #0xf
mul r7, r0, r7
In the code example above, the ORR instruction stalls for three cycles because of the four cycle
result latency for the second destination register of an LDRD instruction. The preceding code can
be rearranged to help remove the pipeline stalls:
; The following ldrd instruction would load values
; into registers r0 and r1
ldrd r0, [r3]
add r6, r7, r8
sub r5, r6, r9
mul r7, r0, r7
orr r8, r1, #0xf
Any memory operation following a LDRD instruction (LDR, LDRD, STR and others) stall for one
cycle.
; The str instruction below will stall for 1 cycle
ldrd r0, [r3]
str r4, [r5]
Содержание PXA270
Страница 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Страница 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Страница 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Страница 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Страница 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Страница 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Страница 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Страница 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Страница 144: ......