6
80960SA
1.1.5
Instruction Cache
To further reduce memory accesses, the 80960SA
includes a 512-byte on-chip instruction cache. The
instruction cache is based on the concept of locality
of reference; most programs are not usually
executed in a steady stream but consist of many
branches, loops and procedure calls that lead to
jumping back and forth in the same small section of
code. Thus, by maintaining a block of instructions in
cache, the number of memory references required to
read instructions into the processor is greatly
reduced.
To load the instruction cache, instructions are
fetched in 16-byte blocks; up to four instructions can
be fetched at one time. An efficient prefetch
algorithm increases the probability that an instruction
will already be in the cache when it is needed.
Code for small loops often fits entirely within the
cache, leading to a great increase in processing
speed since further memory references might not be
necessary until the program exits the loop. Similarly,
when calling short procedures, the code for the
calling procedure is likely to remain in the cache so it
will be there on the procedure’s return.
1.1.6
Register Scoreboarding
The instruction decoder is optimized in several ways.
One optimization method is the ability to overlap
instructions by using register scoreboarding.
Register scoreboarding occurs when a LOAD moves
a variable from memory into a register. When the
instruction initiates, a scoreboard bit on the target
register is set. Once the register is loaded, the bit is
reset. In between, any reference to the register
contents is accompanied by a test of the scoreboard
bit to ensure that the load has completed before
processing continues. Since the processor does not
need to wait for the LOAD to complete, it can execute
additional instructions placed between the LOAD
and the instruction that uses the register contents, as
shown in the following example:
ld data_2, r4
ld data_2, r5
Unrelated instruction
Unrelated instruction
add r4, r5, r6
In essence, the two unrelated instructions between
LOAD and ADD are executed “for free” (i.e., take no
apparent time to execute) because they are
executed while the register is being loaded. Up to
three load instructions can be pending at one time
with three corresponding scoreboard bits set. By
exploiting this feature, system programmers and
compiler writers have a useful tool for optimizing
execution speed.
1.1.7
High Bandwidth Bus
The 80960SA CPU resides on a high-bandwidth
address/data bus. The bus provides a direct commu-
nication path between the processor and the
memory and I/O subsystem interfaces. The
processor uses the bus to fetch instructions,
manipulate memory and respond to interrupts. Bus
features include:
•
16-bit data path multiplexed onto the lower bits of
the 32-bit address path
•
Eight 16-bit half-word burst capability which
allows transfers from 1 to 16 bytes at a time
•
High bandwidth reads and writes with 32
Mbytes/s burst (at 20 MHz)
Table 3 defines bus signal names and functions;
Table 4 defines other component-support signals
such as interrupt lines.
1.1.8
Interrupt Handling
The 80960SA can be interrupted in one of two ways:
by the activation of one of four interrupt pins or by
sending a message on the processor’s data bus.
The 80960SA is unusual in that it automatically
handles interrupts on a priority basis and can keep
track of pending interrupts through its on-chip
interrupt controller. Two of the interrupt pins can be
configured to provide 8259A-style handshaking for
expansion beyond four interrupt lines.
1.1.9
Debug Features
The 80960SA has built-in debug capabilities. There
are two types of breakpoints and six trace modes.
Debug features are controlled by two internal 32-bit
registers, the Process-Controls Word and the Trace-
Controls Word. By setting bits in these control words,
a software debug monitor can closely control how
the processor responds during program execution.