2
80960SA
1.1
Key Performance Features
The 80960SA architecture is based on the most
recent advances in microprocessor technology and
is grounded in Intel’s long experience in the design
and manufacture of embedded microprocessors.
Many features contribute to the 80960SA’s excep-
tional performance:
1. Large Register Set. Having a large number of
registers reduces the number of times that a
processor needs to access memory. Modern
compilers can take advantage of this feature to
optimize execution speed. For maximum flexi-
bility, the 80960SA provides thirty-two 32-bit
registers. (See Figure 2.)
2. Fast Instruction Execution. Simple functions
make up the bulk of instructions in most
programs so that execution speed can be
improved by ensuring that these core instruc-
tions are executed as quickly as possible. The
most frequently executed instructions — such
as register-register moves, add/subtract,
logical operations and shifts — execute in one
to two cycles. (Table 1 contains a list of instruc-
tions.)
3. Load/Store Architecture. One way to improve
execution speed is to reduce the number of
times that the processor must access memory
to perform an operation. As with other
processors based on RISC technology, the
80960SA has a Load/Store architecture. As
such, only the LOAD and STORE instructions
reference memory; all other instructions
operate on registers. This type of architecture
simplifies instruction decoding and is used in
combination with other techniques to increase
parallelism.
4. Simple Instruction Formats. All instructions
in the 80960SA are 32 bits long and must be
aligned on word boundaries. This alignment
makes it possible to eliminate the instruction
alignment stage in the pipeline. To simplify the
instruction decoder, there are only five
instruction formats; each instruction uses only
one format. (See Figure 3.)
5. Overlapped Instruction Execution. Load
operations allow execution of subsequent
instructions to continue before the data has
been returned from memory, so that these
instructions can overlap the load. The
80960SA manages this process transparently
to software through the use of a register score-
board. Conditional instructions also make use
of a scoreboard so that subsequent unrelated
instructions may be executed while the condi-
tional instruction is pending.
6. Integer Execution Optimization. When the
result of an arithmetic execution is used as an
operand in a subsequent calculation, the value
is sent immediately to its destination register.
At the same time, the value is put on a bypass
path to the ALU, thereby saving the time that
otherwise would be required to retrieve the
value for the next operation.
7. Bandwidth Optimizations. The 80960SA gets
optimal use of its memory bus bandwidth
because the bus is tuned for use with the on-
chip instruction cache: instruction cache line
size matches the maximum burst size for
instruction fetches. The 80960SA automatically
fetches four words in a burst and stores them
directly in the cache. Due to the size of the
cache and the fact that it is continually filled in
anticipation of needed instructions in the
program flow, the 80960SA is relatively insen-
sitive to memory wait states. The benefit is that
the 80960SA delivers outstanding performance
even with a low cost memory system.
8. Cache Bypass. If a cache miss occurs, the
processor fetches the needed instruction then
sends it on to the instruction decoder at the
same time it updates the cache. Thus, no extra
time is spent to load and read the cache.