Cycle Timings and Interlock Behavior
ARM DDI 0363E
Copyright © 2009 ARM Limited. All rights reserved.
14-32
ID013010
Non-Confidential, Unrestricted Access
14.21 Floating-point single-precision data processing instructions
This section describes the cycle timing behavior for all single-precision VFP
CDP
instructions.
This includes arithmetic instructions such as
VMUL.F32
, data and immediate moving instructions
such as
“VMOV.F32 <Sd>, #<imm>”
,
VABS.F32
,
VNEG.F32
, and
“VMOV <Sd>, <Sm>”
, and comparison
instructions and conversion instructions.
Table 14-26 shows the floating-point single-precision data processing instructions cycle timing
behavior.
Table 14-26 Floating-point single-precision data processing instructions cycle timing
behavior
Example instruction
Cycles
Early Reg
Result latency
VMLA.F32 <Sd>, <Sn>, <Sm>
a
a. Also
VMLS.F32
,
VNMLS.F32
, and
VNMLA.F32
.
1
b
b.
VMLA.F32
completes out-of-order, and can take an extra cycle (two in total) if an add
instruction (
VADD
) or certain dual-issued instruction pairs are in the iss-stage when the
instruction completes.
<Sn>
,
<Sm>
5
c
c. Except when the instruction dependent on the result
<Sd>
is another
VMLA.F32
instruction, and the dependent operand is the accumulate operand,
<Sd>
. In this case, the
result latency is reduced to 3 cycles.
VADD.F32 <Sd>, <Sn>, <Sm>
d
d. Also
VSUB.F32
,
VMUL.F32
, and
VNMUL.F32
.
1
<Sn>
,
<Sm>
2
VDIV.F32 <Sd>, <Sn>, <Sm>
2
<Sn>
,
<Sm>
16
VSQRT.F32 <Sd>, <Sm>
2
<Sm>
16
VMOV.F32 <Sd>, #<imm>
1
-
1
VMOV.F32 <Sd>, <Sm>
e
e. Also
VABS.F32
and
VNEG.F32
.
1
-
1
VCMP.F32 <Sd>, <Sm>
f
f. Also
VCMPE.F32
.
1
<Sd>
,
<Sm>
-
VCMPE.F32 <Sd>, #0.0
f
1
<Sd>
-
VCVT.F32.U32 <Sd>, <Sm>
g
g. Also
VCVT.F32.S32
.
1
<Sm>
2
VCVT.F32.U32 <Sd>, <Sd>, #<fbits>
h
h. Also
VCVT.F32.U16
,
VCVT.F32.S32
, and
VCVT.F32.S16
.
1
<Sd>
2
VCVTR.U32.F32 <Sd>, <Sm>
i
i. Also
VCVT.U32.F32
,
VCVTR.S32.F32
, and
VCVT.S32.F32
.
1
<Sm>
2
VCVT.U32.F32 <Sd>, <Sd>, #<fbits>
j
j. Also
VCVT.U16.F32
,
VCVT.S32.F32
, and
VCVT.S16.F32
.
1
<Sd>
2
VCVT.F64.F32 <Dd>, <Sn>
3
<Sm>
5