Software Pipelining
6-33
Optimizing Assembly Code via Linear Assembly
6.5.1.2
Floating-Point Example
The floating-point code in Example 6–20 needs ten cycles for each iteration
of the loop, so the iteration interval is ten.
Table 6–6 shows a modulo iteration interval scheduling table for the floating-
point dot product loop before software pipelining (Example 6–20). Each row
represents a functional unit. There is a column for each cycle in the loop show-
ing the instruction that is executing on a particular cycle:
-
LDDWs on the .D units are issued on cycles 0, 10, 20, 30, etc.
-
MPYSPs and on the .M units are issued on cycles 5, 15, 25, 35, etc.
-
ADDSPs on the .L units are issued on cycles 9, 19, 29, 39, etc.
-
SUB on the .S1 unit is issued on cycles 3, 13, 23, 33, etc.
-
B on the .S2 unit is issued on cycles 4, 14, 24, 34, etc.
Table 6–6. Modulo Iteration Interval Scheduling Table for Floating-Point Dot Product
(Before Software Pipelining)
Unit /
Cycle
0, 10, ...
1, 11, ...
2, 12, ...
3, 13, ...
4, 14, ...
5, 15, ...
6, 16, ...
7, 17, ...
8, 18, ...
9, 19, ...
.D1
LDDW
.D2
LDDW
.M1
MPYSP
.M2
MPYSP
.L1
ADDSP
.L2
ADDSP
.S1
SUB
.S2
B
In this example, each unit is used only once every ten cycles.