VFP Instruction Execution
ARM DDI 0301H
Copyright © 2004-2009 ARM Limited. All rights reserved.
21-7
ID012310
Non-Confidential, Unrestricted Access
21.6
Operation of the scoreboards
The VFP11 processor detects all hazard conditions that exist between issued and executing
instructions. It uses two scoreboards to ensure that all source and destination registers for an
instruction contain valid data and are available for reading or writing:
•
The destination scoreboard contains a lock for each destination register for the current
operation.
•
The source scoreboard contains a lock for each source register for the current operation.
In the Decode stage of the VFP11 pipeline, the VFP11 coprocessor determines the source and
destination registers that are involved in an operation and generates a lock mask for them. In a
short vector operation, the lock mask includes the registers involved in every iteration of the
operation. In the Issue stage, the VFP11 coprocessor checks and updates the source and
destination scoreboards. If it detects a hazard between the instruction in the Issue stage and a
prior instruction, the scoreboards are not updated, and the instruction stalls in the Issue stage.
A VFP11 instruction can begin execution only when its source and destination registers are free
of locks. A short vector operation can begin only when the registers for all its iterations are free
of locks. When a short vector instruction proceeds in the pipeline beyond the Issue stage, all the
registers involved in the operation are locked.
The source scoreboard clears a source register lock in the first Execute 1 stage of the pipeline or
in the first Execute 1 stage of the iteration. In store multiple instructions, the source scoreboard
clears source register locks in the Execute stage where the instruction writes the store data to the
ARM11 processor.
The destination scoreboard clears the destination register lock in the cycle before the result data
is written back to the register file or is available for forwarding, Execute 7 in the FMAC pipeline,
Execute 4 in the DS pipeline. In a load operation, the destination scoreboard normally clears the
destination register lock in the Memory 2 stage. If the load is delayed, the destination scoreboard
clears the destination register lock in the same cycle as the writeback to the register file.
21.6.1
Scoreboard operation when an instruction bounces
When a bounce occurs in full-compliance mode, support code is called to complete the
operation and to deliver the result and the exception status to the user trap handler. The source
scoreboard ensures that all source registers for the operation are preserved for the support code.
In a short vector operation, this includes the source registers for the bounced iteration and for
any iterations remaining after the bounced iteration. The preserved source registers include the
destination register for a multiply and accumulate instruction.
Because RunFast mode guarantees that no bouncing is possible, source registers do not have to
be preserved after they are used by the instruction. For all scalar operations and nonmultiple
store operations, no source registers are locked in RunFast mode. In short, vector operations, the
length of the vector determines the source registers that are locked. When the vector length
exceeds four single-precision iterations, the source scoreboard locks the source registers for
iterations 5 and above. When the vector length exceeds two double-precision iterations, the
source scoreboard locks the source registers for iterations 3 and above.