2/24/2008 9T6WP
BCM7405
Preliminary Hardware Data Module
Functional Description
06/29/07
Bro a d c o m C o rp o r a ti o n
Page 1-66
MIPS4380 Processor Core
Document
7405-1HDM00-R
•
Pipeline control unit to orchestrate all the units and to resolve inter-instruction dependency. This includes the register-
file bypass multiplexers used to resolve data dependency between consecutively executed instructions.
•
MDU and eDSP instructions described below
•
Floating Point Unit
•
Bit-manipulation unit implements the count leading zero and one instructions: CLZ and CLO
Multiply Divide Unit
The MDU performs multiply, multiply-accumulate, and divide operations. It consists of a 32x32 pipeline multiplier, HI and LO
result-accumulation registers, a divide state machine, and all multiplexers and control logic required to perform these
functions. The MDU supports execution of a 32x32 multiply and multiply-accumulate operations with a latency of 3 and 4
cycles, respectively. Appropriate interlocks are implemented to stop issuing back-to-back 32x32 multiply operations.
Divide operations are implemented with a 2-bit radix iterative algorithm. Depending on the size of the dividend, the execution
time of a divide operation can be 3 to 18 clock cycles. An attempt to issue an MDU instruction while a divide instruction is
still active causes the pipeline to stall until the divide instruction is completed.
The processor core implements an additional multiply instruction, MUL, which specifies that the lower 32-bits of the multiply
result be placed in a general-purpose register instead of the LO register; this eliminates the need for a subsequent MFLO
(Move From LO) instruction to move the result from LO to a general purpose register.
Two multiply-accumulate instructions, multiply-add (MADD/MADDU) and multiply-subtract (MSUB/MSUBU), are used to
perform the multiply-add and multiply-subtract instructions. The MADD instruction multiplies two numbers and then adds the
product to the current contents of the HI and LO registers. Similarly, the MSUB instruction multiplies two operands and then
subtracts the product from the HI and LO registers. Although the execution time of a multiply-accumulate instruction takes
four cycles, the use of the HI/LO registers allows the MDU to achieve an instruction execution rate of one instruction every
cycle.
Each TP has a set of Hi/Lo registers and can execute any MDU instruction. However, they share the Mult/Div execution unit,
so when both TPs execute MDU at the same time, the execution time observed at each TP may be longer.
Floating-Point Unit
Through the CP1 (Co-processor 1) interface the processor communicates with an IEEE 754 compliant FPU. The FPU runs
at the same speed as the processor.
Each TP in a CMT CPU can execute FPU instructions and each of them has 32 32-bit FP general registers file and 5 32-bit
FP control registers, otherwise the FP execution pipe is shared between the TP’s, i.e., multiple FP instructions from different
TPs can be executed simultaneously in the FPU.