Intel PXA270 Скачать руководство пользователя страница 40

Страница: 40 / 144

3-10

Intel® PXA27x Processor Family

Optimization Guide

System Level Optimization

3.4.3

Buffer for Context Switch

During context switch the states of the process has to be saved. For the PXA27x processor, the

PCB (process control block) can be large in size due to additional registers for Intel® Wireless
MMX™ Technology. In order to reduce context switch latency the internal memory can be

employed.

3.4.4

Scratch Ram

For many application (such as graphics, etc.) the working set may often be larger than the data

cache, and due to the random access nature of the application effective preload may be difficult to
perform. Thus part of the internal ram can be used for storing these critical data-structures. OS can

offer management of such critical data spaces through

malloc()

virtual_alloc().

3.4.5

OS Acceleration

There is much OS- and system- related code that is used in a periodic fashion (e.g. device drivers,

OS daemon processes). Codes for these routines can be stored in the internal memory, this will

reduce the instruction cache miss penalties for the periodic routines.

3.4.6

Increasing Preloads for Memory Performance

Apart from increasing cache efficiency, hiding the memory latency is extremely important. The

proper preload scheme can be used to hide the memory latency for data accesses.

The Intel XScale® Microarchitecture has a preload load instruction (

PLD

). The purpose of this

instruction is to preload data into the data and mini-data caches. Data pre-loading allows hiding of

memory transfer latency while the processor continues to execute instructions. The preload is

important to compiler and assembly code because judicious use of the preload instruction can

enormously improve throughput performance of Intel XScale® Microarchitecture-based
processors. Data preload can be applied not only to loops but also to any data references within a

block of code. Preload also applies to data writing when the memory type is enabled as write

allocate.

Note:

The Intel XScale® Microarchitecture PLD instruction encoding translates to a never execute in the
ARM* V4 architecture. This is to allow compatibility between code using PLD on an Intel

XScale® Microarchitecture processor and older devices. Code that has to run on both architectures

can include the PLD instruction, gaining performance on the Intel XScale® Microarchitecture,

while maintaining compatibility for ARM* V4 (for example, StrongARM). A detailed discussion
on the efficient pre-loading of the data and possible use cases has been explained in

Section 4,

“Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization”

Section 5, “High Level Language Optimization”

, and

Section 6, “Power Optimization”

3.5

Optimization of System Components

In the PXA27x processor, the LCD, DMA controller, Intel® Quick Capture Interface and Intel

XScale® core share the same resources such as system bus, memory controller, etc. Thus, there

may be potential resource conflicts and the sharing of resources may impact the performance of the

end application. For example, a larger LCD display consumes more memory and system bus