Software Pipelining
6-45
Optimizing Assembly Code via Linear Assembly
6.5.3.3
Removing Extraneous Instructions
The code in Example 6–26 and Example 6–27 executes extra iterations of
some of the instructions in the loop. The following operations occur in parallel
on the last cycle of the loop in Example 6–26:
-
Iteration 50 of the ADD instructions
-
Iteration 52 of the MPY and MPYH instructions
-
Iteration 57 of the LDW instructions
The following operations occur in parallel on the last cycle of the loop in
Example 6–27:
-
Iteration 50 of the ADDSP instructions
-
Iteration 54 of the MPYSP instructions
-
Iteration 59 of the LDDW instructions
In most cases, extra iterations are not a problem; however, when extraneous
LDWs and LDDWs access unmapped memory, you can get unpredictable re-
sults. If the extraneous instructions present a potential problem, remove the
extraneous load and multiply instructions by adding an epilog like that included
in the second part of Example 6–28 on page 6-47 and Example 6–29 on
page 6-48.
Fixed-Point Example
To eliminate LDWs in the fixed-point dot product from iterations 51 through 57,
run the loop seven fewer times. This brings the loop counter to 43 (50 – 7),
which means you still must execute seven more cycles of ADD instructions
and five more cycles of MPY instructions. Five pairs of MPYs and seven pairs
of ADDs are now outside the loop. The LDWs, MPYs, and ADDs all execute
exactly 50 times. (The shaded areas of Example 6–28 indicate the changes
in this code.)
Executing the dot product code in Example 6–28 with no extraneous LDWs
still requires a total of 58 cycles (7 + 43 + 7 + 1), but the code size is now larg-
er.
Floating-Point Example
To eliminate LDDWs in the floating-point dot product from iterations 51 through
59, run the loop nine fewer times. This brings the loop counter to 41 (50 – 9),
which means you still must execute nine more cycles of ADDSP instructions
and five more cycles of MPYSP instructions. Five pairs of MPYSPs and nine
pairs of ADDSPs are now outside the loop. The LDDWs, MPYSPs, and