Instruction Pipeline and Execution Timing
e200z3 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-5
The decode stage decodes instructions and performs dependency checking. Simple integer instructions
complete execution in the execute stage of the pipeline.
Execution of load/store instructions is pipelined. The EA calculations for load/store instructions are
performed in the decode stage. This EA is driven out to the data memory in the same stage. The actual
memory access occurs in the execute stage.
Load-to-use dependencies do not incur pipeline bubbles except when the dependent instruction is a load
or store instruction, and the latter instruction is dependent on its previous load data for EA calculation. If
an ALU instruction is dependent on a load instruction, the data is fed directly into the ALU for execution.
No pipeline bubble is incurred in this case.
Multiply instructions require one clock to execute. All condition-setting instructions complete in the
execute stage of the pipeline.
Feed-forwarding allows the result of one instruction to be made available as the source operand(s) of a
subsequent instruction so that data-dependent instructions can execute without waiting for previous
instructions to write back their results.
6.3.2
Instruction Buffers
The e200z3 contains a set of instruction buffers that supply instructions into the instruction register (IR)
for decoding.
Instruction prefetches request a 64-bit double word and the buffer is filled with a pair of instructions at a
time, except for the case of a change of flow fetch where the target is to the second (odd) word. In that
case, only a 32-bit prefetch is performed to load the instruction buffer. This 32-bit fetch may be
immediately followed by a 64-bit prefetch to fill slots 0 and 1 in the event that the branch is resolved to be
taken.
In normal sequential execution, instructions are loaded into the IR from slot 0, and as a pair of slots are
emptied, they are refilled. Whenever a pair of slots is empty, a 64-bit prefetch is initiated that fills the
earliest empty slot pairs beginning with slot 0.
If the instruction buffer empties, instruction issue stalls, and the buffer is refilled. The first returned
instruction is forwarded directly to the IR.