Intel® PXA27x Processor Family
Optimization Guide
5-1
High Level Language Optimization
5
5.1
C and C++ Level Optimization
For embedded systems, the system’s performance is greatly affected by the software programming
techniques. In order to attain performance at the application level, there are many techniques which
can be applied at the C/ C++ code development phase. This chapter covers a set of programming
optimization techniques which are relevant to deeply embedded system such as the Intel® PXA27x
Processor Family (PXA27x processor).
5.1.1
Efficient Usage of Preloading
The Intel XScale® Microarchitecture preload instruction is a true preload instruction because the
load destination is the data or mini-data cache and not a register. Compilers for processors which
have data caches, but do not support preload, sometimes use a load instruction to preload the data
cache. This technique has the disadvantages of using a register to load data and requiring additional
registers for subsequent preloads and thus increasing register pressure. By contrast, the Intel
XScale® Microarchitecture preload can be used to reduce register pressure instead of increasing it.
The Intel XScale® Microarchitecture preload is a hint instruction and does not guarantee that the
data is loaded. Whenever the load would cause a fault or a table walk, then the processor ignores
the preload instruction, the fault or table walk, and continue processing the next instruction. This is
particularly advantageous in the case where a linked list or recursive data structure is terminated by
a NULL pointer. Preloading the NULL pointer does not cause a fault.
The preload instructions (PLD) can be inserted by the compiler during compilation. However, the
programmer can effectively insert preload operations in the code. A function can be defined during
high level language programming which results in a PLD instruction being inserted in-line. This
function can the be called at other suitable places in the code to insert PLD instructions.
5.1.1.1
Preload Considerations
The issues associated with using preloading which require consideration are explained below.
5.1.1.1.1
Preload Distances In the Intel XScale® Microarchitecture
Scheduling the preload instruction requires understanding the system latency times and system
resources which determine when to use the preload instruction.
The optimum advantage of using preload is obtained if the preload issue-to-use distance is equal to
the memory latency. The memory latency shown in
Section 3.2.1, “Optimal Setting for Memory
should be used to determine the proper insertion point for preloads.
Depending on whether the target is in the internal memory or in the external memory, the preload
distance may need to be varied. Also, for external memory in which the target address is not
aligned to a cacheline the memory latency can increase due to the critical word first (CWF) mode
of the memory accesses. CWF mode returns the requested data starting with the requested word
instead of starting with the word at the aligned address.When using preloads, align the target
address to a cache-line boundary in order to avoid the extra memory bus usage.
Summary of Contents for PXA270
Page 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Page 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Page 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Page 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Page 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Page 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Page 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Page 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Page 144: ......