
Architectural overview
UM0404
24/564
DocID13284 Rev 2
Figure 2. CPU block diagram
1.1.1
High instruction bandwidth / fast execution
Most of the ST10F276’s instructions are executed in one instruction cycle. For example,
shift and rotate instructions are processed independent of the number of bits to be shifted.
Multiple-cycle instructions have been optimized: branches are carried out in 2 instruction
cycles, 16
×
16 bit multiplication in 5 instruction cycles and a 32/16-bit division in 10
instruction cycles. The jump cache reduces the execution time of repeatedly performed
jumps in a loop, from 2 instruction cycles to 1 instruction cycle.
The instruction cycle time has been reduced by instruction pipelining. This technique allows
the core CPU to process, in parallel, portions of multiple sequential instruction stages. The
following four stage pipeline provides the optimum balancing for the CPU core:
•
Fetch: In this stage, an instruction is fetched from the internal Flash or RAM or from the
external memory, based on the current IP value.
•
Decode: In this stage, the previously fetched instruction is decoded and the required
operands are fetched.
•
Execute: In this stage, the specified operation is performed on the previously fetched
operands.
•
Write back: In this stage, the result is written to the specified location.
If this technique is not used, each instruction would require four instruction cycles. Pipelining
offers increased performance.
CPU
SP
STKOV
STKUN
Execution Unit
Instruction Pointer
4-Stage
Pipeline
PSW
SYSCON
MDH
MDL
Multiplication
Bit-Mask
Barrel-Shift
CP
16-bit
ALU
R15
R0
ADDRSEL 1
ADDRSEL 2
ADDRSEL 3
ADDRSEL 4
BUSCON 0
BUSCON 1
BUSCON 2
BUSCON 3
BUSCON 4
Code Segment
Data Page
General
Purpose
Registers
2 Kbyte
Bank n
Bank i
Bank 0
16
16
512 Kbyte
IFlash
64 + 2 Kbyte
32
Division Hardware
Generator
Pointer
Pointers
XRAM
IRAM
320 Kbyte
XFlash
16