ST10R272L - ARCHITECTURAL OVERVIEW
12/320
2.1.1
High instruction bandwidth / fast execution
Most of the ST10R272L’s instructions can be executed in just one instruction cycle - shift
and rotate instructions (irrespective of the number of bits to be shifted). Branch, multiply and
divide instructions normally take more than one instruction cycle, but have also been
optimized. For example, branch instructions require only one additional machine cycle when
a branch is taken, and because of the ‘Jump Cache’, most branches taken in loops require
no additional machine cycles.
The four-stage pipeline improves CPU processing speed:
fetch:
an instruction is fetched from the internal ROM or RAM or from the
external memory, based on the current IP value
decode:
the previously fetched instruction is decoded and the required
operands are fetched
execute:
the specified operation is performed on the previously fetched
operands
write back:
the result is written to the specified location
If this technique were not used, each instruction would require four machine cycles.
2.1.2
High function 8-bit and 16-bit arithmetic and logic unit
Instruction decoding is primarily generated from PLA outputs, based on the selected
opcode. No microcode is used and each pipeline stage receives control signals, staged in
control registers, from the decode stage PLAs. Pipeline holds are primarily caused by wait
states for external memory accesses, and cause a signal to be held in the control registers.
Multiple-cycle instructions are performed through instruction injection and simple internal
state machines which modify the required control signals.
All standard arithmetic and logical operations are performed in the 16-bit ALU. For byte
operations, signals are provided from bits six and seven of the ALU result, and are used to
set the condition flags. Multiple precision arithmetic is provided through a 'CARRY-IN' signal,
to the ALU, from previously calculated portions of the desired operation. Most internal
execution blocks perform operations on either 8-bit or 16-bit quantities. Once the pipeline
has been filled, one instruction is completed per machine cycle, except for multiply and
divide. An advanced Booth algorithm allows four bits to be multiplied and two bits to be
divided per machine cycle. Therefore, these operations use two coupled 16-bit registers
(MDL and MDH), and require four and nine machine cycles respectively, to perform a 16-bit
by 16-bit (or 32-bit by 16-bit) calculation, plus one machine cycle to setup and adjust the
operands and the result. The longer multiply and divide instructions can be interrupted
during their execution, to permit fast interrupt response. The Instruction Set contains
instructions for byte packing in memory, and byte and word sign extension for wide
arithmetic operations.