Sun Microelectronics
40
UltraSPARC User’s Manual
long as they do not require the register that is being loaded. An instruction that
attempts to use the data that is being loaded by an instruction in the load buffer
is called a ‘use’ instruction.
The pipelines are not fully decoupled, because UltraSPARC still supports the no-
tion of precise traps, and loads that are younger than a trapping instruction must
not execute, except in the case of deferred traps. Loads themselves can take pre-
cise traps, when exceptions are detected in the pipeline. For example, address
misalignment or access violations detected in the translation process will both be
reported as precise traps. However, when a load has a hardware problem on the
external bus (for example, a parity error), it will generate a deferred trap, since
younger instructions, unblocked by the D-Cache miss, could have been retired
and modified the machine state. This may result in termination of the user thread
or reset. UltraSPARC does not support recovery from such hardware errors, and
they are fatal. See Chapter 11.1 , “Error Handling.”
5.5 Store Buffer
All store operations (including atomic and STA instructions) and barriers or store
completion instructions (MEMBAR and STBAR) are entered into the Store Buffer.
5.5.1 Stores Delayed by Loads
The store buffer normally has lower priority than the load buffer when arbitrat-
ing for the D-Cache or E-Cache, since returning load data is usually more critical
than store completion. To ensure that stores complete in a finite amount of time
as required by SPARC-V9, UltraSPARC eventually will raise the store buffer pri-
ority above load buffer priority if the store buffer is continually locked out by
subsequent loads (other than internal ASI loads). Software using a load spin loop
to wait for a signal from another processor following a store that signals that pro-
cessor will wait for the store to time out in the store buffer. For this type of code,
it is more efficient to put a MEMBAR
#StoreLoad
between the store and the
load spin loop.
5.5.2 Store Buffer Compression
Consecutive non-side-effect stores may be combined into aligned 16-byte entries
in the store buffer to improve store bandwidth. Cacheable stores can only be com-
pressed with adjacent cacheable stores, Likewise, noncacheable stores can only be
compressed with adjacent noncacheable stores. In order to maintain strong order-
ing for I/O accesses, stores with the side-effect attribute (E-bit set) cannot be
combined with any other stores.
Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com