
Sun Microelectronics
272
UltraSPARC User’s Manual
The technique shown in Figure 16-10 can be generalized to N levels, where N
branches are correlated and become more predictable. The above technique may
lead to unrolling of loops that were previously identified as bad candidates, be-
cause of the unpredictable behavior of their conditional branches.
16.2.10 Return Address Stack (RAS)
In order to speed up returns from subroutines invoked through CALL instruc-
tions, UltraSPARC dedicates a 4-deep stack to store the return address. Each time
a CALL is detected, the return address is pushed onto this RAS (Return Address
Stack). Each time a return is encountered, the address is obtained from the top of
the stack and the stack is popped. UltraSPARC considers a return to be a JMPL or
RETURN with rs1 equal to
%o7
(normal subroutine) or
%i7
(leaf subroutine). The
RAS provides a guess for the target address, so that prefetching can continue
even though the address calculation has not yet been performed. JMPL or RE-
TURN instructions using rs1 values other than
%o7
or
%i7
, and DONE or RETRY
instructions also use the value on the top of the RAS for continuing prefetching,
but they do not pop the stack. See Section 10.1, “Overview,” on page 169 for in-
formation about the contents of the RAS during RED_state processing.
16.3 Data Stream Issues
16.3.1 D-Cache Organization
The D-Cache is a 16K byte, direct mapped, virtually indexed, physically tagged
(VIPT), write-through, non-allocating cache. It is logically organized as 512 lines
of 32 bytes. Each line contains two 16-byte sub-blocks (Figure 16-11).
Figure 16-11
Logical Organization of D-Cache
16 bytes
16 bytes
sub-block 0
sub-block 1
512 lines
Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com