4.10 Data dependency delays
Programming the MIPS32® 74K™ Core Family, Revision 02.14
64
How to use the tables
Suppose we’ve got an instruction sequence like this one:
addiu
$a0, $a0, 8
lw
$t0, 0($a0)
# [1]
lw
$t1, 4($a0)
addu
$t2, $t0, $t1# [2]
mul
$v0, $t2, $t3
sw
$v0, 0($a1)
# [3]
Then a look at the tables should help us discover whether any instructions will be held up. Look at the dependencies
where an instruction is dependent on its predecessor:
[1] The
lw
will be held up by two clocks. One clock because
addiu
takes 2 clocks to produce its result, and another
because its GPR address operand
$a0
was computed by the immediately preceding instruction (see the “load/store
address” box of
lw
will be OK.
[2] The
addu
will be two clocks late, because the load data from the preceding
lw
arrives late in the GPR
$t1
(see the
“load” box of
.)
[3] The
sw
will be 6 clocks late starting while it waits for a result from the multiply pipe (the “multiply” box of
.)
These can be additive. In the pointer-chasing sequence:
lw $t1, 0($t0)
lw $t2, 0($t1)
The second load will be held up three clocks: two because of the late delivery of load data in
$t1
(“load” box of
), plus another because that data is required to form the address (“load/store address” box of
Delays caused by dependencies on
DSPControl
fields
Some DSP ASE instructions are dependent because they produce and consume values kept in fields of the
DSPControl
register. However, the most performance-critical of these dependencies are “by-passed” to make sure no
delay will occur - those are the dependencies between:
But other dependencies passed in
DSPControl
may cause delays; in particular the
DSPControl[ouflag]
bits set by vari-
ous kinds of overflow are not ready for a succeeding
rddsp
instruction. The access is interlocked, and will lead to a
delay of up to three clocks. We don’t expect that to be a problem (but if you know different, please get in touch with
MIPS Technologies).
4.10.1 More complicated dependencies
There can be delays which are dependent on the dynamic allocation of resources inside the CPU. In general you can’t
really figure out how much these matter by doing a static code analysis, and we earnestly advise you to get some kind
of high-visibility cycle-accurate simulator or trace equipment.
addsc
→
DSPControl[c]
→
addwc
cmp.x
→
DSPControl[ccond]
→
pick.x
wrdsp
→
DSPControl[pos,scount]
→
insv
Содержание MIPS32 74K Series
Страница 1: ...Document Number MD00541 Revision 02 14 March 30 2011 Programming the MIPS32 74K Core Family...
Страница 10: ...Programming the MIPS32 74K Core Family Revision 02 14 10...
Страница 20: ...1 4 A brief guide to the 74K core implementation Programming the MIPS32 74K Core Family Revision 02 14 20...
Страница 28: ...2 2 PRId register identifying your CPU type Programming the MIPS32 74K Core Family Revision 02 14 28...
Страница 54: ...3 8 The TLB and translation Programming the MIPS32 74K Core Family Revision 02 14 54...
Страница 83: ......
Страница 86: ...6 5 FPU pipeline and instruction timing Programming the MIPS32 74K Core Family Revision 02 14 86...
Страница 101: ...The MIPS32 DSP ASE 101 Programming the MIPS32 74K Core Family Revision 02 14...
Страница 134: ...8 4 Performance counters Programming the MIPS32 74K Core Family Revision 02 14 134...
Страница 154: ...C 3 FPU changes in Release 2 of the MIPS32 Architecture Programming the MIPS32 74K Core Family Revision 02 14 154...