5-8
Intel® PXA27x Processor Family
Optimization Guide
High Level Language Optimization
5.1.3
Cache Blocking
Cache blocking techniques, such as strip-mining
1
, are used to improve the temporal locality of the
data. Given a large data set that can be reused across multiple passes of a loop, data blocking
divides the data into smaller chunks which can be loaded into the cache during the first loop and
then be available for processing on subsequent loops thus minimizing cache misses and reducing
bus traffic.
As an example of cache blocking refer to this code:
for(i=0; i<10000; i++)
for(j=0; j<10000; j++)
for(k=0; k<10000; k++)
C[j][k] += A[i][k] * B[j][i];
The variable A[i][k] is completely reused. However, accessing C[j][k] in the j and k loops can
displace A[i][j] from the cache. Using cache blocking, the code becomes:
for(i=0; i<10000; i++)
for(j1=0; j<100; j++)
for(k1=0; k<100; k++)
for(j2=0; j<100; j++)
for(k2=0; k<100; k++)
{
j = j1 * 100 + j2;
k = k1 * 100 + k2;
C[j][k] += A[i][k] * B[j][i];
}
5.1.4
Loop Interchange
As previously mentioned, the sequence in which data is accessed affects cache thrashing. Usually,
it is best to access data in a spatially contiguous address range. However, arrays of data may have
been laid out such that indexed elements are not physically next to each other. Consider the
following C code which places array elements in row major order.
for(j=0; j<NMAX; j++)
for(i=0; i<NMAX; i++)
{
prefetch(A[i+1][j]);
sum += A[i][j];
}
In the above example, A[i][j] and A[i+1][j] are not sequentially next to each other. This situation
causes an increase in bus traffic when preloading loop data. In some cases where the loop
mathematics are unaffected, the problem can be resolved by induction variable interchange. The
above examples becomes:
for(i=0; i<NMAX; i++)
for(j=0; j<NMAX; j++)
1.
Spatially dispersing the data comprising one data set (for example, an array or structure) throughout a memory range instead of keeping the
data in contiguous memory locations.
Содержание PXA270
Страница 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Страница 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Страница 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Страница 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Страница 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Страница 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Страница 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Страница 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Страница 144: ......