VFP Instruction Execution
ARM DDI 0301H
Copyright © 2004-2009 ARM Limited. All rights reserved.
21-20
ID012310
Non-Confidential, Unrestricted Access
21.10 Parallel execution
The VFP11 coprocessor is capable of execution in each of the three pipelines independently of
the others and without blocking issue or writeback from any pipeline. Separate LS, FMAC, and
DS pipelines enable parallel operation of CDP and data transfer instructions. Scheduling
instructions to take advantage of the parallelism that occurs when multiple instructions execute
in the VFP11 pipelines can result in a significant improvement in program execution time.
A data transfer operation can begin execution if:
•
no data hazards exist with any currently executing operations
•
the LS pipeline is not currently stalled by the ARM11 processor or busy with a data
transfer multiple.
A CDP can be issued to the FMAC pipeline if:
•
no data hazards exist with any currently executing operations
•
the FMAC pipeline is available, that is, no short vector CDP is executing and no
double-precision multiply is in the first cycle of the multiply operation
•
no short vector operation with unissued iterations is currently executing in either the
FMAC or DS pipeline.
A divide or square root instruction can be issued to the DS pipeline if:
•
no data hazards exist with any currently executing operations
•
the DS pipeline is available, that is, no current divide or square root is executing in the DS
pipeline E1 stage
•
no short vector operation with unissued iterations is executing in the FMAC pipeline.
Example 21-13 on page 21-21 shows a case of the VFP11 coprocessor executing instructions in
parallel in each of the three pipelines:
•
a load multiple in the L/S pipeline
•
a divide in the DS pipeline
•
a short vector add in the FMAC pipeline.
In this example, the LEN field contains b011, selecting a vector length of four iterations, and the
STRIDE field contains b00, for a vector stride of one.
Example 21-13 Parallel execution in all three pipelines
FLDM
[R4], {S4-S13}
FDIVS
S0, S1, S2
FADDS
S16, S20, S24