VFP Instruction Execution
ARM DDI 0301H
Copyright © 2004-2009 ARM Limited. All rights reserved.
21-22
ID012310
Non-Confidential, Unrestricted Access
21.11 Execution timing
Complex instruction dependencies and memory system interactions make it impossible to
describe briefly the exact cycle timing of all instructions in all circumstances. The timing that
Table 21-16 lists is accurate in most cases. For precise timing, you must use a cycle-accurate
model of your ARM11 processor.
In Table 21-16, throughput is defined as the cycle after issue in which another instruction can
begin execution. Instruction latency is the number of cycles after which the data is available for
another operation. Forwarding reduces the latency by one cycle for operations that depend on
floating-point data. Table 21-16 lists the throughput and latency for all VFP11 instructions.
Table 21-16 Throughput and latency cycle counts for VFP11 instructions
Instructions
Single-precision
Double-precision
Throughput
Latency
Throughput
Latency
FABS, FNEG, FCVT, FCPY
1
4
1
4
FCMP, FCMPE, FCMPZ, FCMPEZ
1
4
1
4
FSITO, FUITO, FTOSI, FTOUI, FTOUIZ, FTOSIZ
1
8
1
8
FADD, FSUB
1
8
1
8
FMUL, FNMUL
1
8
2
9
FMAC, FNMAC, FMSC, FNMSC
1
8
2
9
FDIV, FSQRT
15
19
29
33
FLD
a
1
4
1
4
FST
a
1
a
System-
dependent
1
System-
dependent
FLDM
a
X
b
X
b
+ 3
X
b
X
b
+ 3
FSTM
a
X
b
System-
dependent
X
b
System-
dependent
FMSTAT
1
2
-
-
FMSR/FMSRR
c
1
4
-
-
FMDHR/FMDHC/FMDRR
c
-
-
1
4
FMRS/FMRRS
c
1
2
-
-
FMRDH/FMRDL/FMRRD
c
-
-
1
2
FMXR
d
1
4
-
-
FMRX
d
1
2
-
-
a. The cycle count for a load instruction is based on load data that is cached and available to the ARM11 processor from the
cache. The cycle count for a store instruction is based on store data that is written to the cache and/or write buffer immediately.
When the data is not cached or the write buffer is unavailable, the number of cycles also depends on the memory subsystem.
b. The number of cycles represented by X is (N/2) if N is even or (N/2 + 1) if N is odd.
c. FMDRR and FMRRD transfer one double-precision data per transfer. FMSRR and FMRRS transfer two single-precision data
per transfer.
d. FMXR and FMRX are serializing instructions. The latency depends on the register transferred and the current activity in the
VFP11 coprocessor when the instruction is issued.