61
Programming the MIPS32® 74K™ Core Family, Revision 02.14
4.10 Data dependency delays
The 74K core’s out-of-order pipeline does a very good job of running dependent instructions as soon as possible, in
hardware. So to some extent it makes it unnecessary to manage data delays by moving instructions around in the pro-
gram sequence (and if you feel you should try, it makes it tricky to predict the effect of your tuning). Ideally, you
should use an instrumented real CPU or cycle-accurate simulator to get insight into detailed tuning effects.
Compilers might reasonably try to schedule code to create more opportunities for dual-issue and so that instructions
might be issued at full speed despite dependencies, but should rarely do so if the cost is significant — the hardware is
already gaining much of this advantage within its out-of-order window (think of it as looking 7-15 instructions ahead
in the program sequence), and compiler scheduling will not be worth many extra instructions or significant code bloat
unless it reaches beyond such a window. Loop unrolling will often help, but local scheduling will be unlikely to make
a lot of difference.
We’ve attempted to tabulate all possible producer/consumer delays affecting user-level code (we’re not discussing
CP0 registers here), but excluding floating point (which is in the next section). These are just fixed delays, of course:
if a load misses in the cache, that’s different (and there are notes about it, above).
The MIPS instruction set is efficient for short pipelines because, most of the time, dependent instructions can be run
nose-to-tail, just one clock apart, without extra delay. Even in the more sophisticated 74K family CPUs, most depen-
dent instructions can run just two clocks apart. Each register has a “standard” place in the pipeline where the producer
should deliver its value and another place in the pipeline where the consumer picks it up: where those places are 1
cycle apart, the dependent instructions to run in successive cycles. Producer/consumer delays happen when either the
producer is late delivering a result to the register (a “lazy producer”), or the consumer insists on obtaining its operand
early (an “eager consumer”). If a lazy producer feeds an eager consumer, the delays add up.
Most of these delays are hidden by out-of-order execution. Moreover, non-dependent ALU and AGEN instructions
may be issued simultaneously, so sometimes even a delay of zero cycles is painful.
Different register classes are read/written in different “standard” pipeline slots, so it’s important to be clear what class
of registers is involved in any of these delays. For non-floating-point user-level code, there are just three:
•
General purpose registers (“GPR”).
•
The multiply unit’s
hi
/
lo
pair together with the three additional multiply-unit accumulators defined by the MIPS
DSP ASE (“ACC”).
The MIPS architecture encourages implementations to provide integer multiply and divide operations in a sepa-
rately-pipelined unit (see
), and in 74K family cores this unit is capable of doing
multiply-accumulate operations at a rate of one per clock. No multiply unit operation ever causes an exception,
which makes the longer multiply-unit pipeline rather invisible. It shows up in late delivery of GPR values by
those few multiply-unit instructions which deliver GPR results.
•
The fields of the
DSPControl
register, used for condition codes and exceptional conditions resulting from DSP
ASE operations.
So that gives us two tables:
for our eager consumers, and
for the producers (we’ve listed even the
non-lazy producers, since there aren’t very many of them).
Summary of Contents for MIPS32 74K Series
Page 1: ...Document Number MD00541 Revision 02 14 March 30 2011 Programming the MIPS32 74K Core Family...
Page 10: ...Programming the MIPS32 74K Core Family Revision 02 14 10...
Page 54: ...3 8 The TLB and translation Programming the MIPS32 74K Core Family Revision 02 14 54...
Page 83: ......
Page 101: ...The MIPS32 DSP ASE 101 Programming the MIPS32 74K Core Family Revision 02 14...
Page 134: ...8 4 Performance counters Programming the MIPS32 74K Core Family Revision 02 14 134...