Sun Microelectronics
278
UltraSPARC User’s Manual
16.3.6.4 Mixing Independent Loads and Stores
Note:
The bus turnaround penalty is two cycles for systems running in 1–1–1
mode only; systems running in 2–2 mode incur no turnaround penalty.
Mixing reads and writes from and to the E-Cache results in a penalty, caused by
the difference in timing between reads and writes and also the bus turnaround
time. UltraSPARC automatically tends to separate loads and stores through the
use of the load buffer and store buffer. The loads are given access to the E-Cache,
even if older stores have been waiting to access it. Only when the number of
stores passes the “high-water mark” (5 stores) does the store buffer have priority.
The code can be organized to further minimize the number of bus turnaround cy-
cles. Code Example 16-3 shows how loads and stores can be grouped so that only
one turn-around penalty occurs (for a given state of the load buffer and store
buffer). This can be accomplished with the help of a memory reference analyzer
(Section 16.3.9, “Non-Faulting Loads,” covers this in more detail).
Code Example 16-3 Avoiding Bus Turnaround Penalties (1–1–1 mode only)
16.3.6.5 Using LDDF to Load Two Single-Precision Operands/Cycle
UltraSPARC supports single cycle 8-byte data transfers into the floating-point
register file for LDDF. Wherever possible, applications that use single-precision
floating-point arithmetic heavily should organize their code and data to replace
two LDFs with one LDDF. This reduces the load frequency by approximately one
half, and cuts execution time considerably.
16.3.7 Store Buffer Considerations
The store buffer on UltraSPARC is designed so that stores can be issued even
when the data is not ready. More specifically, a store can be issued in the same
group as the instruction producing the result. The address of a store is buffered
until the data is eventually available. Once in the store buffer, the store data is
buffered until it can be sent “quietly” (that is, without interfering with other in-
structions) to the D-Cache, the E-Cache, I/0 devices, or the frame buffer (for non-
cacheable stores).
ld
[addr1],%l1
ld[addr1],%l1
st
[addr2],%l2
ld[addr3],%l3
ld
[addr3],%l3
st[addr2],%l2
st
[addr4],%l4
st[addr4],%l4
2 Penalties
1 Penalty
Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com