User Manual
BCM1250/BCM1125/BCM1125H
10/21/02
B r o a d c o m C o r p o r a t i o n
Document
1250_1125-UM100CB-R
Section 6: DRAM Page
121
The simple configuration with two channels each with one physical bank of memory provides a good illustration
of the trade-offs involved. The most straightforward way to assign the channels is to have one cover address
0 to N-1 and the other address N to 2N-1. However, most of the time there is a reasonable degree of locality
(for example, it is likely that the program and its data will reside entirely in the first N locations of memory) so
all activity is going to a single channel -- the effective memory bandwidth has been halved and the access
latency will increase because of contention for the channel. In some systems this may be desirable, the second
channel will have a lower and more deterministic access latency since it is so lightly loaded. But this gain has
come at a very large cost to the system performance as a whole. At the other extreme the channels could be
interleaved every cache line. This seems a good choice since the distribution of accesses is likely to be equal
across even and odd cache lines, so both channels will be equally used. But since a contiguous access (e.g.
a packet streaming in or out of the system) will flip back and forth between the two channels there is less
likelihood that good use can be made of open pages in the memory. A good compromise is to interleave every
four cache lines.
Note that the argument of the previous paragraph will also apply to packet buffers. Dedicating one channel to
packet buffers and one to the program and associated data will quite often result in the packet buffer channel
(which has CPU accesses as well as inbound and outbound DMA) being very heavily used and the other
channel (which will only be used on accesses that miss in both the L1 and L2 caches) being under-used. In
this case the network traffic is limited to half the memory bandwidth and will incur latency associated with using
a busy channel. The system performance will be improved by interleaving the channels and thus removing the
hot spot and allowing the bandwidth to be shared more evenly.
A good general starting point applies the principle from the previous examples: keep a few cache lines
contiguous to allow for page mode accesses, then use low bits to interleave across the two channels, the
internal banks within a device, and the physical banks (chip selects) on a channel. Using the low bits make it
likely that even over short periods of time there is a reasonably even distribution of accesses across the
regions. The address is therefore broken up as:
This format can be used to set the mask bits.
1
The bottom 3 bits are ignored and should be set to zero. The next two bits (
cc
) are also ignored, but are always
used as column bits, so they must be considered when the total number of column bits in the device is checked.
2
The next two bits are used for column interleave. For 32-byte blocks (and no column interleave), do not use
any column bits here. For 64-byte blocks, use one column bit here, and for 128 byte blocks use two column
bits here. These bits will be set in the
mc_cs
N_col
registers.
3
As discussed above, the next bit is a good one to use for interleaving between the two channels. This bit
number is assigned in the
mc_config
register.
4
The next bits are used to select the bank within a memory device. Most devices have four banks, so two bits
will be set in the
mc_cs
N_ba
registers.
5
When interleaving across physical banks on a channel via chip-selects one or two bits must be set in the
mc_cs_interleave
register. If there are only two physical banks then mixed_cs mode will be used and one bit
set, if there are four physical banks then interleaved_cs mode is used and both interleave bits are needed.
6
The remaining column bits are set in the
mc_cs
N_col
registers. These must be contiguous and the number
of bits set will be the number of column address bits needed minus the number set in steps (1) and (2).
7
The remaining address bits form the row address in the
mc_cs
N_row
registers. The number of bits should
match the number of row address bits needed by the memory device.
RRRR...R
CCCC...C
NN
BB
P
CC
cc000
7
6
5
4
3
2
1