User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
gx_01.fm.(1.2)
March 27,2006
PowerPC 750GX Overview
Page 31 of 377
For a more detailed discussion of instruction completion, see Section 6.6.1, Branch, Dispatch, and Comple-
tion-Unit Resource Requirements, on page 237.
1.2.2 Independent Execution Units
In addition to the BPU, the 750GX has the following five execution units:
• Two integer units (IUs)
• Floating-point unit (FPU)
• Load/store unit (LSU)
• System register unit (SRU)
1.2.2.1 Integer Units (IUs)
The integer units, IU1 and IU2, are shown in Figure 1-1 on page 25. IU1 can execute any integer instruction;
IU2 can execute any integer instruction except multiplication and division instructions. Each IU has a single-
entry reservation station that can receive instructions from the dispatch unit and operands from the GPRs or
the rename buffers. The output of the IU is latched in the rename buffer assigned to the instruction by the
dispatch unit.
Each IU consists of three single-cycle subunits—a fast adder/comparator, a subunit for logical operations,
and a subunit for performing rotates, shifts, and count-leading-zero operations. These subunits handle all
1-cycle arithmetic and logical integer instructions; only one subunit can execute an instruction at a time.
The IU1 has a 32-bit integer multiplier/divider, as well as the adder, shift, and logical units of the IU2. The
multiplier supports early exit for operations that do not require full 32
×
32-bit multiplication. Multiply and
divide instructions spend several cycles in the execution stage before the results are written to the output
rename buffer.
1.2.2.2 Floating-Point Unit (FPU)
The FPU, shown in Figure 1-1 on page 25, is designed as a 3-stage pipelined processing unit, where the first
stage is for multiply, the second stage is for add, and the third stage is for normalize. A single-precision
multiply/add operation is processed with 1-cycle throughput and 3-cycle latency. (A single-precision instruc-
tion spends one cycle in each stage of the FPU). A double-precision multiply requires two cycles in the
multiply stage and one cycle in each additional stage. A double-precision multiply/add has a 2-cycle
throughput and a 4-cycle latency. As instructions are dispatched to the FPU reservation station, source
operand data can be accessed from the FPRs or from the FPR rename buffers. Results, in turn, are written to
the rename buffers and are made available to subsequent instructions. Instructions pass through the reserva-
tion station and the pipeline stages in program order. Stalls due to contention for FPRs are minimized by
automatic allocation of the six floating-point rename buffers. The completion unit writes the contents of the
rename buffer to the appropriate FPR when floating-point instructions are retired.
The 750GX supports all IEEE 754-1985 floating-point data types (normalized, denormalized, not a number
(NaN), zero, and infinity) in hardware, eliminating the latency incurred by software exception routines. (Note
that “exception” is also referred to as “interrupt” in the architecture specification.)