![Intel IXP45X Developer'S Manual Download Page 208](http://html1.mh-extra.com/html/intel/ixp45x/ixp45x_developers-manual_2073092208.webp)
Intel
®
IXP45X and Intel
®
IXP46X Product Line of Network Processors—Intel XScale
®
Processor
Intel
®
IXP45X and Intel
®
IXP46X Product Line of Network Processors
Developer’s Manual
August 2006
208
Order Number: 306262-004US
• Four fill buffers
• Four pending buffers
• Eight half-cache line write buffer
DDRI SDRAM resources are typically:
• Four memory banks
• One page buffer per bank referencing a 4K address range
• Four transfer request buffers
Consider how these resources work together. A fill buffer is allocated for each cache
read miss. A fill buffer is also allocated each cache write miss if the memory space is
write allocate along with a pending buffer. A subsequent read to the same cache line
does not require a new fill buffer, but does require a pending buffer and a subsequent
write will also require a new pending buffer. A fill buffer is also allocated for each read
to a non-cached memory and a write buffer is needed for each memory write to non-
cached memory that is non-coalescing. Consequently, a STM instruction listing eight
registers and referencing non-cached memory will use eight write buffers assuming
they don’t coalesce and two write buffers if they do coalesce. A cache eviction requires
a write buffer for each dirty bit set in the cache line. The prefetch instruction requires a
fill buffer for each cache line and 0, 1, or 2 write buffers for an eviction.
When adding prefetch instructions, caution must be asserted to insure that the
combination of prefetch and instruction bus requests do not exceed the system
resource capacity described above or performance will be degraded instead of
improved. The important points are to spread prefetch operations over calculations so
as to allow bus traffic to free flow and to minimize the number of necessary prefetches.
3.10.4.4.5
Cache Memory Considerations
Stride, the way data structures are walked through, can affect the temporal quality of
the data and reduce or increase cache conflicts. The data cache and mini-data caches
for the IXP45X/IXP46X network processors each have 32 sets of 32 bytes. This means
that each cache line in a set is on a modular 1-K-address boundary. The caution is to
choose data structure sizes and stride requirements that do not overwhelm a given set
causing conflicts and increased register pressure. Register pressure can be increased
because additional registers are required to track prefetch addresses. The effects can
be affected by rearranging data structure components to use more parallel access to
search and compare elements. Similarly rearranging sections of data structures so that
sections often written fit in the same half cache line, 16 bytes for the IXP45X/IXP46X
network processors, can reduce cache eviction write-backs. On a global scale,
techniques such as array merging can enhance the spatial locality of the data.
As an example of array merging, consider the following code:
In the above code, data is read from both arrays a and b, but a and b are not spatially
close. Array merging can place a and b specially close.
int a_array[NMAX];
int b_array[NMAX];
int ix;
for (i=0; i<NMAX]; i++)
{
ix = b[i];
if (a[i] != 0)
ix = a[i];
do_other calculations;
}