5-8
Intel® PXA27x Processor Family
Optimization Guide
High Level Language Optimization
5.1.3
Cache Blocking
Cache blocking techniques, such as strip-mining
1
, are used to improve the temporal locality of the
data. Given a large data set that can be reused across multiple passes of a loop, data blocking
divides the data into smaller chunks which can be loaded into the cache during the first loop and
then be available for processing on subsequent loops thus minimizing cache misses and reducing
bus traffic.
As an example of cache blocking refer to this code:
for(i=0; i<10000; i++)
for(j=0; j<10000; j++)
for(k=0; k<10000; k++)
C[j][k] += A[i][k] * B[j][i];
The variable A[i][k] is completely reused. However, accessing C[j][k] in the j and k loops can
displace A[i][j] from the cache. Using cache blocking, the code becomes:
for(i=0; i<10000; i++)
for(j1=0; j<100; j++)
for(k1=0; k<100; k++)
for(j2=0; j<100; j++)
for(k2=0; k<100; k++)
{
j = j1 * 100 + j2;
k = k1 * 100 + k2;
C[j][k] += A[i][k] * B[j][i];
}
5.1.4
Loop Interchange
As previously mentioned, the sequence in which data is accessed affects cache thrashing. Usually,
it is best to access data in a spatially contiguous address range. However, arrays of data may have
been laid out such that indexed elements are not physically next to each other. Consider the
following C code which places array elements in row major order.
for(j=0; j<NMAX; j++)
for(i=0; i<NMAX; i++)
{
prefetch(A[i+1][j]);
sum += A[i][j];
}
In the above example, A[i][j] and A[i+1][j] are not sequentially next to each other. This situation
causes an increase in bus traffic when preloading loop data. In some cases where the loop
mathematics are unaffected, the problem can be resolved by induction variable interchange. The
above examples becomes:
for(i=0; i<NMAX; i++)
for(j=0; j<NMAX; j++)
1.
Spatially dispersing the data comprising one data set (for example, an array or structure) throughout a memory range instead of keeping the
data in contiguous memory locations.
Summary of Contents for PXA270
Page 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Page 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Page 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Page 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Page 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Page 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Page 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Page 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Page 144: ......