BCM1250/BCM1125/BCM1125H
User Manual
10/21/02
B r o a d c o m C o r p o r a t i o n
Page
156
Section 7: DMA
Document
1250_1125-UM100CB-R
DMA C
ONFIGURATIONS
There are many ways that the DMA structure can be configured allowing use with many existing systems. The
CPU should always use the cacheable coherent mode to access descriptors and buffers to allow the system
to take care of the coherence and avoid the need for software to manage the L1 caches. The DMA engines
access descriptors and buffers through the I/O bridge which will always use cacheable coherent accesses to
memory addresses and uncacheable (accelerated) accesses for other addresses.
Consideration should be given to the descriptors and their access pattern. In memory space all reads from the
controller will be of a full cache block, so if the system were configured to use chain mode descriptors and 32
byte buffers the number of memory accesses fetching descriptors will equal the number of memory accesses
for data. The system performance will be lower than expected because half of the bandwidth is consumed by
descriptor management. In a new system it is recommended that the descriptors be organized in ring mode
and the controller be permitted to prefetch descriptors (tdx_en set in the
dma_config0
register). This reduces
the descriptor fetch overhead compared to the number of buffer transfers. In most cases receive descriptors
should also be marked for allocation in the L2 cache (dscr_l2ca set in the
dma_config1
register) since they
are a shared resource and they are expected to be accessed over a short time period by the DMA controller
(twice for the header descriptor which will be updated with the length and status information) and the CPU.
When the DMA controller updates a descriptor it must use a read-modify-write operation. If software does not
need the SOP bit to be cleared in transmit descriptors, the read-modify-write can be avoided by setting the
no_dscr_updt bit in the
dma_config1
register. If this is done, transmit descriptors do not need to be marked
for L2 cache allocation by the DMA engine, the CPU will have them cacheable in both L1 and L2 caches so
the DMA engine will read them directly from the cache and it never writes them back.
The buffer size can also be increased to reduce the descriptor overhead. While the hardware does support
buffers as small as 32 bytes (and these may be ideal as header buffers) the system becomes more efficient
as the buffer size is increased. In the transmit channel the tbx_en bit should be set in the
dma_config0
register
to allow the controller to prefetch buffer data whenever possible. If this bit is set the controller will mostly fetch
pairs of cache blocks back to back and is more likely to make use of an open page in the SDRAM. On the
receive side cache block writes are posted as soon as the data is available and this bit has no effect.
The most sensitive interface is the transmit side of the Ethernet (or Packet FIFO in GMII mode). Once
transmission of a packet has started there must be no interruptions. This is achieved by using a small data
FIFO in the interface and priority in the memory controller. The data FIFO is intended just for speed match
buffering. There is a threshold which sets how much of a packet must be in the FIFO for transmission to begin,
since the FIFO drains at a constant rate this directly translates into the memory latency that will be covered by
the buffer. In general the FIFO will fill quickly (particularly when tbx_en allows fetching of pairs of cache blocks),
and drain during any high latency memory reads. The worst case is when a new descriptor must be fetched
during a packet transmission, since the next block of data cannot be fetched until the descriptor read
completes. To allow this to work with a relatively small FIFO the memory controller implements a priority
scheme. This is needed because the CPUs and data mover can easily swamp the memory controller (they can
access data at much higher bandwidths and frequencies than the I/O DMA engines, so this is an ideal place
for priority to protect the low request rate interface from being dominated by a high request rate one). Any reads
from the I/O bridge 1 are prioritized over other memory accesses, if they conflict with a write to memory the
priority of the write is also raised. In addition some of the memory controller buffers can be reserved for use by
the I/O bridge 1 DMA engines. The priority scheme is described in
. If memory is accessed across the HyperTransport there is no priority scheme and
the transmit FIFO is likely to underflow during packet transmission (however, because the writes are posted it
should be possible to receive into buffers in a memory across the HyperTransport fabric).