Cycle Timings and Interlock Behavior
ARM DDI 0301H
Copyright © 2004-2009 ARM Limited. All rights reserved.
16-12
ID012310
Non-Confidential, Unrestricted Access
16.7
Multiplies
The multiplier consists of a three-cycle pipeline with early result forwarding not possible other
than to the internal accumulate path. For a subsequent multiply accumulate the result is available
one cycle earlier than for all other uses of the result.
Certain multiplies require:
•
more than one cycle to execute.
•
more than one pipeline issue to produce a result.
Multiplies with 64-bit results take and require two cycles to write the results, consequently they
have two result latencies with the low half of the result always available first. The multiplicand
and multiplier are required as Early Regs because they are both required at the start of MAC1.
Table 16-10 lists the cycle timing behavior of example multiply instructions.
Table 16-10 Example multiply instruction cycle timing behavior
Example Instruction
Cycle
s
Cycles if sets flags
Early Reg
Late Reg
Result latency
MUL(S)
2
5
<Rm>, <Rs>
-
4
MLA(S)
2
5
<Rm>, <Rs>
<Rn>
4
SMULL(S)
3
6
<Rm>, <Rs>
-
4/5
UMULL(S)
3
6
<Rm>, <Rs>
-
4/5
SMLAL(S)
3
6
<Rm>, <Rs>
<RdLo>
4/5
UMLAL(S)
3
6
<Rm>, <Rs>
<RdLo>
4/5
SMULxy
1
-
<Rm>, <Rs>
-
3
SMLAxy
1
-
<Rm>, <Rs>
-
3
SMULWy
1
-
<Rm>, <Rs>
-
3
SMLAWy
1
-
<Rm>, <Rs>
-
3
SMLALxy
2
-
<Rm>, <Rs>
<RdHi>
3/4
SMUAD, SMUADX
1
-
<Rm>, <Rs>
-
3
SMLAD, SMLADX
1
-
<Rm>, <Rs>
-
3
SMUSD, SMUSDX
1
-
<Rm>, <Rs>
-
3
SMLSD, SMLSDX
1
-
<Rm>, <Rs>
-
3
SMMUL, SMMULR
2
-
<Rm>, <Rs>
-
4
SMMLA, SMMLAR
2
-
<Rm>, <Rs>
<Rn>
4
SMMLS, SMMLSR
2
-
<Rm>, <Rs>
<Rn>
4
SMLALD, SMLALDX
2
-
<Rm>, <Rs>
<RdHi>
3/4
SMLSLD, SMLSLDX
2
-
<Rm>, <Rs>
<RdHi>
3/4
UMAAL
3
-
<Rm>, <Rs>
<RdLo>
4/5