MOTOROLA
INTEGER CPU
MMC2001
2-2
REFERENCE MANUAL
2.2 Features
The main features of the M•CORE are as follows:
• 32-bit load/store RISC architecture
• Fixed 16-bit instruction length
• 16-entry, 32-bit general-purpose register file
• Efficient 4-stage execution pipeline, hidden from application software
• Single-cycle instruction execution for many instructions
• Two cycles for taken branches and memory access instructions
• Support for byte, halfword, and word memory accesses
• Fast interrupt support with 16-entry dedicated alternate register file
• Vectored and autovectored interrupt support
2.3 Microarchitecture Summary
The M•CORE instruction execution pipeline consists of the following stages:
• Instruction fetch
• Instruction decode/register file read
• Execute
• Register writeback
These stages operate in an overlapped fashion, allowing single-clock instruction exe-
cution for most instructions.
Sixteen general-purpose registers are provided for source operands and instruction
results. Register R15 is used as the link register to hold the return address for sub-
routine calls, and register R0 is associated with the current stack pointer value by
convention.
The execution unit consists of a 32-bit arithmetic/logic unit (ALU), a 32-bit barrel
shifter, a find-first-one unit (FFO), result feed-forward hardware, and miscellaneous
support hardware for multiplication and multiple register loads and stores. Arithmetic
and logical operations are executed in a single cycle with the exception of the multi-
ply, signed divide, and unsigned divide instructions. The multiply instruction is imple-
mented with a 2-bit per clock, overlapped-scan, modified Booth algorithm with early-
out capability to reduce execution time for operations with small multiplier values. The
signed divide and unsigned divide instructions also have data-dependent timing. A
find-first-one unit operates in a single clock cycle.
The program counter unit has a PC incrementer and a dedicated branch address
adder to minimize delays during change of flow operations. Branch target addresses
are calculated in parallel with branch instruction decode, with a single pipeline bubble
for taken branches and jumps. This results in an execution time of two clocks. Condi-
tional branches that are not taken execute in a single clock.
Memory load and store operations are provided for byte, halfword, and word (32-bit)
data with automatic zero extension of byte and halfword load data. These instructions
can execute in two clock cycles. Load and store multiple register instructions allow
low overhead context save and restore operations. These instructions can execute in
(N+1) clock cycles, where N is the numbers of registers to transfer.
Freescale Semiconductor,
I
Freescale Semiconductor, Inc.
For More Information On This Product,
Go to: www.freescale.com
nc.
..