Sun Microelectronics
277
16. Code Generation Guidelines
Code Example 16-1 Load Hit Bypassing Load Miss (Not Supported on UltraSPARC)
In Code Example 16-1, the first ADD will stall the pipeline until both the load
miss and the load hit are handled. If the ADDs are interchanged, the first ADD
can proceed as soon as the load miss is handled.
As a rule, if load latencies are expected to be a problem, the compiler should al-
ways schedule the use of loads in the same order that the loads appear in the pro-
gram. While blocking part of an array in the D-Cache and operating on the data
during a previous D-Cache miss may help reduce register pressure (three extra
registers could be made available for an inner loop), the added complexity need-
ed to handle conflicts in accessing the D-Cache array offsets the potential benefits
(for example, adding a port to the D-Cache vs. adding a bubble on collisions).
16.3.6.3 Loads to the Same D-Cache Sub-block
When a load enters the load buffer, the memory location loaded is compared to
all other (older) loads in the buffer. If the other loads are to the same 16-byte sub-
block, the entering load is marked as a hit, since by the time it accesses the
D-Cache array, the sub-block will be present (Code Example 16-2). The detection
of a hit eliminates a transaction to the E-Cache, which results in making more
slots available for other clients of the E-Cache bus (I-Cache, store buffer, snoops).
Thus, it helps to organize the code so that data is accessed sequentially. This may
involve interchanging loops so that array subscripts are incremented by one be-
tween each load access.
Code Example 16-2 Interleaved D
-Cache
Hits and Misses to Same Sub-block
In 2–2 mode, UltraSPARC can access the E-Cache only every other cycle. This still
provides an average of 8 bytes per cycle, but only in 16-byte chunks. Thus, it is
important to try to schedule sequential loads to the same 16-byte D-Cache line,
since this allows systems running in 2–2 mode to achieve the same steady-state
load/issue rate as in 1–1–1 mode.
ld
[%l1+%g0],%l6 (D-Cache miss)
ld
[%l2+%g0],%l7 (D-Cache hit)
add
%l7,%g1,%g2
(use of D-Cache hit)
add
%l6,%g1,%g3
(use of D-Cache miss)
.align start 16 bytes
ld
[start],%f0
(D-Cache miss)
ld
[start + 8],%f2
(D-Cache hit)
ld
[start + 16],%f4 (D-Cache miss)
ld
[start + 24],%f6 (D-Cache hit)
Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com