5-2
Intel® PXA27x Processor Family
Optimization Guide
High Level Language Optimization
Consider this code sample:
add r1, r1, #1
; Sequence of instructions using r2, but leave r3 unchanged.
ldr r2, [r3]
add r3, r3, #4
mov r4, r3
sub r2, r2, #1
The sub instruction above would stall if the data being loaded misses the cache. These stalls can be
avoided by using a PLD instruction as:
pld [r3]
add r1, r1, #1
; Sequence of instructions using r2, but leave r3 unchanged.
ldr r2, [r3]
add r3, r3, #4
mov r4, r3
sub r2, r2, #1
For most cases, optimizing for the external memory latency also satisfies the requirements for the
internal memory latency.
5.1.1.1.2
Preload Loop Scheduling
When adding preload instructions to a loop which operates on arrays, preload ahead one, two, or
more iterations. The data for future iterations is located in memory a fixed offset from the data for
the current iteration. This makes it easy to predict where to fetch the data. The number of iterations
to preload ahead is referred to as the preload scheduling distance (PSD). For the Intel XScale®
Microarchitecture this can be calculated as:
Where:
N
linexfer
The number of core clocks required to transfer one complete cache line.
N
pref
The number of cache lines to be pre-loaded for both reading and writing.
N
evict
The number of cache half line evictions caused by the loop.
N
inst
The number of instructions executed in one iteration of the loop
N
hwlinexfer
The number of core clocks required to write half a cache line (as if) only one of the
cache line dirty bits were set when a line eviction occurred.
CPI
This is the average number of core clocks per instruction (of the instructions within the
loop).
PSD
calculated in the above equation is a good initial estimation, but may not be the optimum
scheduling distance. Estimating N
evict
is difficult from static code. However, if the operational data
uses the mini-data cache and if the loop operations overflow the mini-data cache, then a first order
PSD
floor
N
linexfer
N
pref
×
N
hwlinexfer
N
evict
×
+
(
)
CPI N
inst
×
(
)
---------------------------------------------------------------------------------------------------
=
Summary of Contents for PXA270
Page 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Page 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Page 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Page 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Page 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Page 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Page 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Page 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Page 144: ......