MicroBlaze Processor Reference Guide
51
UG081 (v14.7)
Pipeline Architecture
Pipeline Architecture
MicroBlaze instruction execution is pipelined. For most instructions, each stage takes one clock
cycle to complete. Consequently, the number of clock cycles necessary for a specific instruction to
complete is equal to the number of pipeline stages, and one instruction is completed on every cycle.
A few instructions require multiple clock cycles in the execute stage to complete. This is achieved
by stalling the pipeline.
When executing from slower memory, instruction fetches may take multiple cycles. This additional
latency directly affects the efficiency of the pipeline. MicroBlaze implements an instruction prefetch
buffer that reduces the impact of such multi-cycle instruction memory latency. While the pipeline is
stalled by a multi-cycle instruction in the execution stage, the prefetch buffer continues to load
sequential instructions. When the pipeline resumes execution, the fetch stage can load new
instructions directly from the prefetch buffer instead of waiting for the instruction memory access to
complete. If instructions are modified during execution (e.g. with self-modifying code), the prefetch
buffer should be emptied before executing the modified instructions, to ensure that it does not
contain the old unmodified instructions. The recommended way to do this is using an MBAR
instruction, although it is also possible to use a synchronizing branch instruction, for example BRI 4.
Three Stage Pipeline
With
C_AREA_OPTIMIZED
set to 1, the pipeline is divided into three stages to minimize hardware
cost: Fetch, Decode, and Execute.
Five Stage Pipeline
With
C_AREA_OPTIMIZED
set to 0, the pipeline is divided into five stages to maximize
performance: Fetch (IF), Decode (OF), Execute (EX), Access Memory (MEM), and Writeback
(WB).
Branches
Normally the instructions in the fetch and decode stages (as well as prefetch buffer) are flushed
when executing a taken branch. The fetch pipeline stage is then reloaded with a new instruction from
the calculated branch address. A taken branch in MicroBlaze takes three clock cycles to execute,
two of which are required for refilling the pipeline. To reduce this latency overhead, MicroBlaze
supports branches with delay slots.
cycle1
cycle2
cycle3
cycle4
cycle5
cycle6
cycle7
instruction 1
Fetch
Decode
Execute
instruction 2
Fetch
Decode
Execute
Execute
Execute
instruction 3
Fetch
Decode
Stall
Stall
Execute
cycle1 cycle2 cycle3 cycle4 cycle5 cycle6 cycle7 cycle8 cycle9
instruction 1
IF
OF
EX
MEM
WB
instruction 2
IF
OF
EX
MEM
MEM
MEM
WB
instruction 3
IF
OF
EX
Stall
Stall
MEM
WB