Introduction to the VFP coprocessor
ARM DDI 0301H
Copyright © 2004-2009 ARM Limited. All rights reserved.
18-14
ID012310
Non-Confidential, Unrestricted Access
18.7
Parallel execution of instructions
The VFP11 coprocessor provides the ability to execute several floating-point operations in
parallel, while the ARM11 processor is executing ARM instructions. While a short vector
operation executes for a number of cycles in the VFP11 coprocessor, it appears to the ARM11
processor as a single-cycle instruction and is retired in the ARM11 processor before it completes
execution in the VFP11 coprocessor.
The three pipelines are designed to operate independently of one another when initial processing
is completed. This makes it possible to issue a short vector operation and a load or store multiple
operation in the next cycle and have both executing at the same time, provided no data hazards
exist between the two instructions. With this mechanism, algorithms that can be double-buffered
can be written to hide much of the time to transfer data to and from the VFP11 coprocessor
under the arithmetic operations, resulting in a significant improvement in performance.
The separate DS pipeline enables both data transfer operations and CDPs that are not to the DS
pipeline to execute in parallel with the divide. The DS block has a dedicated write port to the
register file, and no special care is required when executing operations in parallel with divide or
square root instructions.
Parallel execution
on page 21-20 describes it in more detail.