80960MC
2
1.1
Key Performance Features
The 80960 architecture is based on the most recent
advances in microprocessor technology and is
grounded in Intel’s long experience in the design and
manufacture of embedded microprocessors. Many
features contribute to the 80960MC’s exceptional
performance:
1.
Large Register Set. Having a large number of
registers reduces the number of times that a
processor needs to access memory. Modern
compilers can take advantage of this feature to
optimize execution speed. For maximum flexi-
bility, the 80960MC provides thirty-two 32-bit
registers. (See
Figure 2
.)
2.
Fast Instruction Execution. Simple functions
make up the bulk of instructions in most
programs so that execution speed can be
improved by ensuring that these core instruc-
tions are executed as quickly as possible. The
most frequently executed instructions such as
register-register moves, add/subtract, logical
operations and shifts execute in one to two
cycles. (
Table 1
contains a list of instructions.)
3.
Load/Store Architecture. One way to improve
execution speed is to reduce the number of
times that the processor must access memory
to perform an operation. As with other proces-
sors based on RISC technology, the 80960MC
has a Load/Store architecture. As such, only
the LOAD and STORE instructions reference
memory; all other instructions operate on regis-
ters. This type of architecture simplifies instruc-
tion decoding and is used in combination with
other techniques to increase parallelism.
4.
Simple Instruction Formats. All instructions
in the 80960MC are 32 bits long and must be
aligned on word boundaries. This alignment
makes it possible to eliminate the instruction
alignment stage in the pipeline. To simplify the
instruction decoder, there are only five instruc-
tion formats; each instruction uses only one
format. (See
Figure 3
.)
5.
Overlapped Instruction Execution. Load
operations allow execution of subsequent
instructions to continue before the data has
been returned from memory, so that these
instructions can overlap the load. The
80960MC manages this process transparently
to software through the use of a register score-
board. Conditional instructions also make use
of a scoreboard so that subsequent unrelated
instructions may be executed while the condi-
tional instruction is pending.
6.
Integer Execution Optimization. When the
result of an arithmetic execution is used as an
operand in a subsequent calculation, the value
is sent immediately to its destination register.
Yet at the same time, the value is put on a
bypass path to the ALU, thereby saving the
time that otherwise would be required to
retrieve the value for the next operation.
7.
Bandwidth Optimizations. The 80960MC
gets optimal use of its memory bus bandwidth
because the bus is tuned for use with the on-
chip instruction cache: instruction cache line
size matches the maximum burst size for
instruction fetches. The 80960MC automati-
cally fetches four words in a burst and stores
them directly in the cache. Due to the size of
the cache and the fact that it is continually filled
in anticipation of needed instructions in the
program flow, the 80960MC is relatively insen-
sitive to memory wait states. The benefit is that
the 80960MC delivers outstanding perfor-
mance even with a low cost memory system.
8.
Cache Bypass. When a cache miss occurs,
the processor fetches the needed instruction
then sends it on to the instruction decoder at
the same time it updates the cache. Thus, no
extra time is spent to load and read the cache.