3-8
Intel® PXA27x Processor Family
Optimization Guide
System Level Optimization
3.3.2.5
Using Mini-Data Cache
The mini-data cache (X=1, C=1, B=0) is best used for data structures which have short temporal
lives, and/or cover vast amounts of data space. Addressing these types of data spaces from the data
cache would corrupt much, if not all, of the data cache by evicting valuable data. Eviction of
valuable data will reduce performance. Placing this data instead in a mini-data cache memory
region would help prevent data cache corruption while providing the benefits of cached accesses.
These examples of data that could be assigned to mini-data cache:
•
The stack space of a frequently occurring interrupt: The stack is used during the short duration
of the interrupt only.
•
Streaming media data: In many cases, the media steam’s data has limited time span usage and
would otherwise repeatedly evict the main data cache.
Overuse of the mini-data cache leads to thrashing the cache. This is easy to do because the mini-
data cache has two ways per set. For example, a loop which uses a simple statement such as:
for (i=0; I< IMAX; i++)
{
A[i] = B[i] + C[i];
}
Where A, B, and C reside in a mini-data cache memory region and each is array is aligned on a 1 K
boundary quickly thrashes the cache.
The mini-data cache could also be used to keep frequently used tables cached. The advantage of
keeping these in the minicache is two-fold. First, the data thrashing in the main cache does not
thrash the frequently used tables and coefficients. Second, it saves main cache space from locking
the critical blocks. For applications like mpeg4, mp3, gsm-amr that handle big data streams,
locking main data cache for these tables is not an efficient use of cache. During execution of such
applications, these are some examples of tables which can effectively make use of the minicache:
•
Huffman tables
•
Sine-Cosine look-up tables
•
Color-conversion look-up tables
•
Motion compensation vector tables
3.3.2.6
Reducing Cache Conflicts, Pollution and Pressure
Cache pollution occurs when unused data is loaded in the cache and cache pressure occurs when
data that is not temporal to the current process is loaded into the cache. Excessive pre-loading and
data locking should be avoided. For an example, see
Section 5.1.1.1.2, “Preload Loop Scheduling”
. Increasing data locality through the use of programming techniques will help this
aspect as well.
3.3.3
Optimizing TLB (Translation Lookaside Buffer) Usage
The Intel XScale® Microarchitecture offers 32 entries for instruction and data TLBs. The TLB unit
also offers a hardware page-table walk. This eliminates the need for using a software page table
walk and software management of the TLBs.
Summary of Contents for PXA270
Page 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Page 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Page 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Page 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Page 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Page 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Page 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Page 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Page 144: ......