Introduction to the VFP coprocessor
ARM DDI 0301H
Copyright © 2004-2009 ARM Limited. All rights reserved.
18-8
ID012310
Non-Confidential, Unrestricted Access
Figure 18-2 DS pipeline
DS pipeline instructions
The DS pipeline executes the following instructions:
FDIV
Divide.
FSQRT
Square root.
The VFP11 coprocessor executes divide and square root instructions for both single-precision
and double-precision operands with all IEEE 754 standard rounding modes supported. The DS
unit uses a shared radix-4 algorithm that provides a good balance between speed and chip area.
DS operations have a latency of 19 cycles for single-precision operations and 33 cycles for
double-precision operations. The throughput is 15 cycles for single-precision operations and 29
cycles for double-precision operations.
18.4.3
LS pipeline
The LS pipeline handles all of the instructions that involve data transfer to and from the ARM11
processor, including loads, stores, moves to coprocessor system registers, and moves from
coprocessor system registers. It remains synchronized with the ARM11 LS pipeline for the
duration of the instruction. Data written to the ARM11 processor is read from the VFP11
coprocessor register file in the Issue stage and transferred to the ARM11 processor in the next
cycle and is latched on the ARM11 data cache1/data cache 2 cycle boundary.
The transfer is made on a dedicated 64-bit store data bus between the VFP11 coprocessor and
the ARM11 processor. Load data is written to the VFP11 coprocessor on a dedicated 64-bit load
bus between the ARM11 processor and all coprocessors. Data is received by the VFP11
Read
port Fn
Next root
multiples
Increment
Divisor/root multiple
Final
result
select
Read
port Fm
Load
forward
FMAC
forward
Zero detect
Divisor/
radicand
Dividend
Next
quotient/
root
selection
Normalize
Sign
Partial
remainder/radicand
Execute 2
Execute 3
Execute 1
Issue
Execute 4
To
register
file
Special results
Write-
back