Intel® PXA27x Processor Family
Optimization Guide
3-5
System Level Optimization
3.3.1.1
Round Robin Replacement Cache Policy
Both the data and the instruction caches use a round robin replacement policy to evict a cache line.
The simple consequence of this is that every line will eventually be evicted, assuming a non-trivial
program. The less obvious consequence is that predicting when and over which cache lines
evictions take place is difficult to predict. This information must be gained by experimentation
using performance profiling.
3.3.1.2
Code Placement to Reduce Cache Misses
Code placement can greatly affect cache misses. One way to view the cache is to think of it as 32
sets of 32 bytes, which span an address range of 1024 bytes. When running, the code maps into 32
blocks modular 1024 of cache space. Any overused sets will thrash the cache. The ideal situation is
for the software tools to distribute the code on a temporal evenness over this space.
This is not possible for a compiler to do automatically. Most of the input needed to best estimate
how to distribute the code will come from profiling followed by compiler-based two pass
optimizations.
3.3.1.3
Locking Code into the Instruction Cache
One important instruction cache feature is the ability to lock code into the instruction cache. Once
locked into the instruction cache, the code is always available for fast execution. Another reason
for locking critical code into cache is that with the round robin replacement policy, eventually the
code is evicted, even if it is a frequently executed function. Key code components to consider
locking are:
•
Interrupt handlers
•
OS Timer clock handlers
•
OS critical code
•
Time critical application code
The disadvantage to locking code into the cache is that it reduces the cache size for the rest of the
program. How much code to lock is application dependent and requires experimentation to
optimize.
Code placed into the instruction cache should be aligned on a 1024 byte boundary and placed
sequentially together as tightly as possible so as not to waste memory space. Making the code
sequential also insures even distribution across all cache ways. Though it is possible to choose
randomly located functions for cache locking, this approach runs the risk of locking multiple cache
ways in one set and few or none in another set. This distribution unevenness can lead to excessive
thrashing of instruction cache.
3.3.2
Increasing Data Cache Performance
There are different techniques which can be used to increase the data cache performance. These
include, optimizing cache configuration and programming techniques etc. This section offers a set
of system-level optimization opportunities; however program-level optimization techniques are
equally important.
Summary of Contents for PXA270
Page 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Page 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Page 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Page 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Page 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Page 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Page 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Page 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Page 144: ......