6.5 FPU pipeline and instruction timing
Programming the MIPS32® 74K™ Core Family, Revision 02.14
84
6.5.1 FPU register dependency delays
Any FPU instruction must go through pipeline stages from M1 through A2 before it produces a result, which can then
(as shown by the “bypass” lines in the pipeline diagram) be used by a dependent instruction reaching the M1 stage. If
you want to keep the FPU pipeline full, that means it’s enough to have three non-dependent instructions between the
consumer and producer of an FP value. However, there’s no guarantee that all the FP pipeline slots will be filled, and
then three intervening instructions will be excessive. Good compilers should try to schedule FP instructions, but not at
unreasonable cost.
6.5.2 Delays caused by long-latency instructions looping in the M1 stage
Instructions which take only one clock in M1 go through the pipeline smoothly and can be completed one per FPU
clock period. Instructions which take longer in M1 always prevent the next instruction from starting in the next clock,
regardless of any data dependency. Those long-latency instructions - double-precision multiplies and all division and
square root operations - are listed in
. An instruction which runs for 2 cycles in M1 holds up the FPU pipe-
line for one clock and so on - and of course the cycle counts are for FPU cycles.
6.5.3 Delays on FP load and store instructions
FP store instructions graduate from the main pipeline (subject to dependencies and freedom from address excep-
tions), and then wait in a special queue until FP data is delivered. The store data will be significantly delayed com-
pared to an integer store instruction: but unless some other instruction reads the target cache line, the program will
probably not see much delay.
FP load instructions in the main pipeline are treated like integer loads; an FP load which hits in the cache can be com-
pleted in the main pipeline. The load data is passed from D-cache into the FPU pipeline, and you should see no more
than the usual FP producer-consumer delay from load to use. FPU load instructions which miss are processed in the
memory pipeline. FP loads are non-blocking too, so it will be the consuming instruction (if any) which is delayed.
6.5.4 Delays when main pipeline waits for FPU to decide not to take an exception
The MIPS architecture requires FP exceptions to be “precise”, which (in particular) means that no instruction after
the FP instruction causing the exception may do anything software-visible. That means that an FP instruction in the
main pipeline may not be committed, nor leave the main pipeline, until the FPU can either report the exception, or
confirm that the instruction will not cause an exception.
Floating point instructions cause exceptions not only because a user program has requested the system to trap IEEE
exceptional conditions (which is unusual) but also because the hardware is not capable of generating or accepting
very small (“denormalized”) numbers in accordance with the IEEE standards. The latter (“unimplemented”) excep-
tion is used to call up a software emulator to patch up some rare cases. But the main pipeline must be stalled until the
Table 6.2 Long-latency FP instructions
Operand
Instruction type
Instructions
Cycles in M1
Double-precision (64-bit)
Any multiplication
mul.d
,
madd.d
,
msub.d
,
nmadd.d
,
nmsub.d
2
Single-precision (32-bit)
Reciprocal
recip.s
10
divide, square-root
div.s
,
sqrt.s
14
reciprocal square root
rsqrt.s
14
Double-precision (64-bit)
Reciprocal
recip.d
21
divide, square-root
div.d
,
sqrt.d
29
reciprocal square root
rsqrt.d
31
Содержание MIPS32 74K Series
Страница 1: ...Document Number MD00541 Revision 02 14 March 30 2011 Programming the MIPS32 74K Core Family...
Страница 10: ...Programming the MIPS32 74K Core Family Revision 02 14 10...
Страница 20: ...1 4 A brief guide to the 74K core implementation Programming the MIPS32 74K Core Family Revision 02 14 20...
Страница 28: ...2 2 PRId register identifying your CPU type Programming the MIPS32 74K Core Family Revision 02 14 28...
Страница 54: ...3 8 The TLB and translation Programming the MIPS32 74K Core Family Revision 02 14 54...
Страница 83: ......
Страница 86: ...6 5 FPU pipeline and instruction timing Programming the MIPS32 74K Core Family Revision 02 14 86...
Страница 101: ...The MIPS32 DSP ASE 101 Programming the MIPS32 74K Core Family Revision 02 14...
Страница 134: ...8 4 Performance counters Programming the MIPS32 74K Core Family Revision 02 14 134...
Страница 154: ...C 3 FPU changes in Release 2 of the MIPS32 Architecture Programming the MIPS32 74K Core Family Revision 02 14 154...