3-10
Intel® PXA27x Processor Family
Optimization Guide
System Level Optimization
3.4.3
Buffer for Context Switch
During context switch the states of the process has to be saved. For the PXA27x processor, the
PCB (process control block) can be large in size due to additional registers for Intel® Wireless
MMX™ Technology. In order to reduce context switch latency the internal memory can be
employed.
3.4.4
Scratch Ram
For many application (such as graphics, etc.) the working set may often be larger than the data
cache, and due to the random access nature of the application effective preload may be difficult to
perform. Thus part of the internal ram can be used for storing these critical data-structures. OS can
offer management of such critical data spaces through
malloc()
or
virtual_alloc().
3.4.5
OS Acceleration
There is much OS- and system- related code that is used in a periodic fashion (e.g. device drivers,
OS daemon processes). Codes for these routines can be stored in the internal memory, this will
reduce the instruction cache miss penalties for the periodic routines.
3.4.6
Increasing Preloads for Memory Performance
Apart from increasing cache efficiency, hiding the memory latency is extremely important. The
proper preload scheme can be used to hide the memory latency for data accesses.
The Intel XScale® Microarchitecture has a preload load instruction (
PLD
). The purpose of this
instruction is to preload data into the data and mini-data caches. Data pre-loading allows hiding of
memory transfer latency while the processor continues to execute instructions. The preload is
important to compiler and assembly code because judicious use of the preload instruction can
enormously improve throughput performance of Intel XScale® Microarchitecture-based
processors. Data preload can be applied not only to loops but also to any data references within a
block of code. Preload also applies to data writing when the memory type is enabled as write
allocate.
Note:
The Intel XScale® Microarchitecture PLD instruction encoding translates to a never execute in the
ARM* V4 architecture. This is to allow compatibility between code using PLD on an Intel
XScale® Microarchitecture processor and older devices. Code that has to run on both architectures
can include the PLD instruction, gaining performance on the Intel XScale® Microarchitecture,
while maintaining compatibility for ARM* V4 (for example, StrongARM). A detailed discussion
on the efficient pre-loading of the data and possible use cases has been explained in
“Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization”
Section 5, “High Level Language Optimization”
, and
Section 6, “Power Optimization”
3.5
Optimization of System Components
In the PXA27x processor, the LCD, DMA controller, Intel® Quick Capture Interface and Intel
XScale® core share the same resources such as system bus, memory controller, etc. Thus, there
may be potential resource conflicts and the sharing of resources may impact the performance of the
end application. For example, a larger LCD display consumes more memory and system bus
Summary of Contents for PXA270
Page 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Page 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Page 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Page 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Page 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Page 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Page 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Page 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Page 144: ......