65
Programming the MIPS32® 74K™ Core Family, Revision 02.14
4.11 Advice on tuning instruction sequences (particularly DSP)
DSP algorithm functions are often the subject of intense tuning. There is some specific and helpful advice (with
examples) included in the white paper
published by MIPS Technologies.
But you need to know the basic latencies of instructions as executed by the 74K core (that is, how many cycles later
can a dependent instruction be issued). For these purposes there are four classes of instructions:
•
A group of specially-simple ALU instructions run in one cycle. This includes bitwise logical instructions,
mov
(an alias for
addu
with
$0
), shifts up to 8 positions down or up, test-and-set instructions, and sign-extend instruc-
tions. See the list at the top of
•
Simple DSP ASE operations (no multiply, no saturation) have 2-cycle latency, the same as most regular MIPS32
arithmetic.
•
Non-multiply DSP instructions which feature saturation or rounding have 3-cycle latency.
•
Special DSP multiply operations (or any other access to the multiply unit accumulators): these have timings like
standard multiply and multiply-accumulate instructions, so they’re in with the multiply operations under the next
heading.
•
Instruction dependencies relating to different fields in the
DSPControl
register are tracked separately, and effi-
ciently, as if they were separate registers. But any
rddsp
or
wrdsp
instruction which reads/writes multiple fields
at once is dependent on multiple fields, and that can’t be tracked through the CB system. Such a
rddsp
is not
issued until all predecessors have graduated, and such a
wrdsp
must graduate before its successors can issue.
You can often avoid this by using the “masked” versions of these instructions to read or write only the field
you’re particularly interested in.
4.12 Multiply/divide unit and timings
As is traditional with MIPS CPUs, the integer multiplier is a semi-detached unit with its own pipeline. All MIPS32
CPUs implement:
•
mult
/
multu
: multiply two 32-bit numbers from GPRs (signed and unsigned versions) with a 64-bit result deliv-
ered in the multiply unit’s accumulator. The accumulator was traditionally seen as pseudo-registers
hi
and
lo
,
readable only using the special instructions
mfhi
and
mflo
.Operations into the accumulator do not hold up the
main CPU and run independently, but
mfhi
/
mflo
are interlocked and delay execution as required until the result
is available.
•
madd
,
maddu
,
msub
,
msubu
: multiply/accumulate instructions collecting their result in the accumulator.
•
mul
/
mulu
: simple 3-operand multiply as a single instruction.
•
div
/
divu
: divide - the quotient goes into
lo
and the remainder into
hi
.
Many of the most powerful instructions in the MIPS DSP ASE are variants of multiply or multiply-accumulate oper-
ations, and are described in
Chapter 7, “The MIPS32® DSP ASE” on page 87
. The DSP ASE also provides three
additional “accumulators” which behave like the
hi
/
lo
pair: the now four accumulators are called
ac0-3
). When we
talk about the “multiply/divide” group of instructions we include any instruction which reads or writes any accumula-
tor.
Summary of Contents for MIPS32 74K Series
Page 1: ...Document Number MD00541 Revision 02 14 March 30 2011 Programming the MIPS32 74K Core Family...
Page 10: ...Programming the MIPS32 74K Core Family Revision 02 14 10...
Page 54: ...3 8 The TLB and translation Programming the MIPS32 74K Core Family Revision 02 14 54...
Page 83: ......
Page 101: ...The MIPS32 DSP ASE 101 Programming the MIPS32 74K Core Family Revision 02 14...
Page 134: ...8 4 Performance counters Programming the MIPS32 74K Core Family Revision 02 14 134...