User Manual
BCM1250/BCM1125/BCM1125H
10/21/02
B r o a d c o m C o r p o r a t i o n
Document
1250_1125-UM100CB-R
Section 3: System Overview
Page
15
Rule (
) is a potential problem, since writes could pass earlier reads to the same device (and thus the read
could see the state after the write completes). However, rule (
) ensures that this situation never arises for
code running on the SB-1. Other than blocking this write following read problem, rule (
) does allow multiple
outstanding uncacheable accesses which will reach the peripherals in order. A series of loads performed to a
FIFO will give the expected results, and by having multiple loads in flight the FIFO can be driven at full speed.
Cacheable non-coherent requests have the same timing as cacheable coherent requests as far as the CPU
L1 data cache is concerned. They will be written to the data cache in program order, and the order with respect
to uncached operations will be as described above. However, since these requests are outside the coherence
domain it is unknown when writes will be visible to the rest of the system (i.e. flushed to L2 or memory and not
hidden behind a different cached non-coherent copy of the block).
Locks can be implemented using the load linked (
ll
) and store conditional (
sc
) instructions. These have
additional rules:
8
Load linked will not issue speculatively, it is held at the issue point until all preceding instructions have
graduated.
9
Store Conditional includes a partial SYNC operation. No instructions will be issued from the time the store
conditional is issued until it graduates.
In most cases no extra SYNC instructions are needed when acquiring a lock. The load linked is used to check
if the lock is free and the store conditional can be used to claim it. If the load indicates that the lock is in use or
the store conditional fails the code should spin waiting for the lock (or the process can be blocked, depending
on the situation). If the load indicates the lock is free and the store conditional succeeds then the lock has been
acquired.
There is a potential problem when freeing a lock. A normal store is used to release the lock. However, a SYNC
may be required before the store to force completion of cacheable loads before the lock is released. This will
only be the case if loads are performed while the lock is held, but the use of the data is delayed until the lock
has been released. Consider the case where two loads are done and one hits in the L1 cache but the other
misses in the L1 cache and is not used until after the lock is released. The CPU will not stall and (if there is no
SYNC) can therefore execute the write to release the lock. There is a (very small) chance that the other CPU
can claim the lock (by getting the line exclusive from the first CPU and having the load linked/store conditional
succeed) and modify the data before the load from the first CPU gets the data. Thus the first CPU will not see
the data that was present when it held the lock, which is probably an error. This can be solved using the SYNC,
or ensuring there is a use of the load before the lock is released (rule