![Intel IXP45X Developer'S Manual Download Page 66](http://html1.mh-extra.com/html/intel/ixp45x/ixp45x_developers-manual_2073092066.webp)
Intel
®
IXP45X and Intel
®
IXP46X Product Line of Network Processors—Functional Overview
Intel
®
IXP45X and Intel
®
IXP46X Product Line of Network Processors
Developer’s Manual
August 2006
66
Reference Number: 306262-004US
2.2.9
Write Buffer
The write buffer (WB) holds data for storage to memory until the bus controller can act
on it. The WB is eight entries deep, where each entry holds 16 bytes. The WB is
constantly enabled and accepts data from the Intel XScale processor, D-cache, or mini-
data cache.
Coprocessor 15, Register 1 specifies whether WB coalescing is enabled or disabled.
When coalescing is disabled, stores to memory occur in program order regardless of
the attribute bits within the descriptors located in the DTLB.
When coalescing is enabled, the attribute bits within the descriptors located in the
DTLB are examined to determine when coalescing is enabled for the destination region
of memory. When coalescing is enabled in both CP15, R1 and the DTLB, data entering
the WB can coalesce with any of the eight entries (16 bytes) and be stored to the
destination memory region, but possibly out of program order.
Stores to a memory region specified to be non-cacheable and non-bufferable by the
attribute bits within the descriptors located in the DTLB causes the Intel XScale
processor to stall until the store completes. A coprocessor register can specify draining
of the write buffer.
2.2.10
Multiply-Accumulate Coprocessor
For efficient processing of high-quality, media-and-signal-processing algorithms, the
Multiply-Accumulate Coprocessor (CP0) provides 40-bit accumulation of 16 x 16, dual-
16 x 16 (SIMD), and 32 x 32 signed multiplies. Special MAR and MRA instructions are
implemented to move the 40-bit accumulator to two Intel XScale processor-general
registers (MAR) and move two Intel XScale processor-general registers to the 40-bit
accumulator (MRA). The 40-bit accumulator can be stored or loaded to or from D-
cache, mini-data cache, or memory using two STC or LDC instructions.
The 16 x 16 signed multiply-accumulates (MIAxy) multiply either the high/high, low/
low, high/low, or low/high 16 bits of a 32-bit core general register (multiplier) and
another 32-bit core general register (multiplicand) to produce a full, 32-bit product that
is sign-extended to 40 bits and added to the 40-bit accumulator.
Dual-signed, 16 x 16 (SIMD) multiply-accumulates (MIAPH) multiply the high/high and
low/low 16-bits of a packed 32-bit, core-general register (multiplier) and another
packed 32-bit, core-general register (multiplicand) to produce two 16-bits products
that are both sign-extended to 40 bits and added to the 40-bit accumulator.
The 32 x 32 signed multiply-accumulates (MIA) multiply a 32-bit, core-general register
(multiplier) and another 32-bit, core-general register (multiplicand) to produce a 64-bit
product where the 40 LSBs are added to the 40-bit accumulator. The 16 x 32 versions
of the 32 x 32 multiply-accumulate instructions complete in a single cycle.
2.2.11
Performance Monitoring Unit
The performance monitoring unit (PMU) contains four 32-bit, event counters and one
32-bit, clock counter. The event counters can be programmed to monitor I-cache hit
rate, data caches hit rate, ITLB hit rate, DTLB hit rate, pipeline stalls, BTB prediction hit
rate, and instruction execution count.