5-2
Intel® PXA27x Processor Family
Optimization Guide
High Level Language Optimization
Consider this code sample:
add r1, r1, #1
; Sequence of instructions using r2, but leave r3 unchanged.
ldr r2, [r3]
add r3, r3, #4
mov r4, r3
sub r2, r2, #1
The sub instruction above would stall if the data being loaded misses the cache. These stalls can be
avoided by using a PLD instruction as:
pld [r3]
add r1, r1, #1
; Sequence of instructions using r2, but leave r3 unchanged.
ldr r2, [r3]
add r3, r3, #4
mov r4, r3
sub r2, r2, #1
For most cases, optimizing for the external memory latency also satisfies the requirements for the
internal memory latency.
5.1.1.1.2
Preload Loop Scheduling
When adding preload instructions to a loop which operates on arrays, preload ahead one, two, or
more iterations. The data for future iterations is located in memory a fixed offset from the data for
the current iteration. This makes it easy to predict where to fetch the data. The number of iterations
to preload ahead is referred to as the preload scheduling distance (PSD). For the Intel XScale®
Microarchitecture this can be calculated as:
Where:
N
linexfer
The number of core clocks required to transfer one complete cache line.
N
pref
The number of cache lines to be pre-loaded for both reading and writing.
N
evict
The number of cache half line evictions caused by the loop.
N
inst
The number of instructions executed in one iteration of the loop
N
hwlinexfer
The number of core clocks required to write half a cache line (as if) only one of the
cache line dirty bits were set when a line eviction occurred.
CPI
This is the average number of core clocks per instruction (of the instructions within the
loop).
PSD
calculated in the above equation is a good initial estimation, but may not be the optimum
scheduling distance. Estimating N
evict
is difficult from static code. However, if the operational data
uses the mini-data cache and if the loop operations overflow the mini-data cache, then a first order
PSD
floor
N
linexfer
N
pref
×
N
hwlinexfer
N
evict
×
+
(
)
CPI N
inst
×
(
)
---------------------------------------------------------------------------------------------------
=
Содержание PXA270
Страница 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Страница 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Страница 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Страница 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Страница 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Страница 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Страница 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Страница 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Страница 144: ......