Sun Microelectronics
292
UltraSPARC User’s Manual
17.7.1.2 Cache Timing
The following example illustrates D-Cache hit timing. The first load causes
UltraSPARC to enter delayed return mode, returning data in the N
1
Stage. The
second load is also in delayed return mode returning data in its N
1
Stage, other-
wise it would collide with the first load data. The group containing the third load
and the first ADD (which references the first load data) is stalled in the E Stage
for one clock until both load uses by the first ADD have returned data. Since the
third load is stalled in E, its normal C Stage data return will not collide with a
previous delayed return mode load. This allows the last ADD to avoid an E Stage
stall. If the third load was not grouped with the first ADD, it would not be stalled
in the E Stage, and the last ADD would be dispatched one clock earlier. The third
load causes the pipeline to exit delayed return mode.
A D-Cache load miss that hits the E-Cache will return data seven clocks after the
load reaches the C Stage for delayed return mode and six clocks after the load
reaches the C Stage otherwise. Because load data is returned in order, a D-Cache
load hit that reaches the C Stage one clock after a D-Cache miss also returns data
seven clocks after the load reaches the C Stage for signed integer loads and six
clocks after the load reaches the C Stage otherwise. The latency for subsequent
D-Cache load hits is reduced as bubbles occur between loads reaching the C
Stage and there are no D-Cache misses.
17.7.1.3 Block Memory Accesses
Unlike other loads, block loads do not lock all of their destination registers. If
there are two block loads outstanding, any instruction except a block store will be
held in the G Stage until the first block load leaves the load buffer. A block load
leaves the load buffer when its first word of data has returned. Each system clock
that Data_Stall is asserted when returning subsequent words of the block load
causes two or three bubbles to be inserted into the pipeline, depending on the
processor-to-UPA frequency ratio.
LDSB
[i1], i6 (D-Cache hit)
G
E
C
N
1
N
2
N
3
W
LDB
[i3],
i7 (D-Cache hit)
G
E
C
N
1
N
2
N
3
W
LDB
[i7],
i4 (D-Cache hit)
G
E
E
C
N
1
ADD
i6,i7,i8
G
E
E
C
N
1
N
2
ADD
i4,i5,i9
G
E
C
Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com