Software Pipelining
6-35
Optimizing Assembly Code via Linear Assembly
Table 6–7. Modulo Iteration Interval Table for Fixed-Point Dot Product
(After Software Pipelining)
Loop Prolog
Unit / Cycle
0
1
2
3
4
5
6
7, 8, 9...
.D1
LDW
*
LDW
**
LDW
***
LDW
****
LDW
*****
LDW
******
LDW
*******
LDW
.D2
LDW
*
LDW
**
LDW
***
LDW
****
LDW
*****
LDW
******
LDW
*******
LDW
.M1
MPY
*
MPY
**
MPY
.M2
MPYH
*
MPYH
**
MPYH
.L1
ADD
.L2
ADD
.S1
SUB
*
SUB
**
SUB
***
SUB
****
SUB
*****
SUB
******
SUB
.S2
B
*
B
**
B
***
B
****
B
*****
B
ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
ÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁ
Note:
The asterisks indicate the iteration of the loop; shading indicates the single-cycle loop.
The rightmost column in Table 6–7 is a single-cycle loop that contains the
entire loop. Cycles 0–6 are loop setup code, or loop prolog.
Asterisks define which iteration of the loop the instruction is executing each
cycle. For example, the rightmost column shows that on any given cycle inside
the loop:
-
The ADD instructions are adding data for iteration
n.
-
The MPY instructions are multiplying data for iteration n + 2 (**).
-
The LDW instructions are loading data for iteration n + 7 (*******).
-
The SUB instruction is executing for iteration n + 6
(******).
-
The B instruction is executing for iteration n + 5 (*****).
In this case, multiple iterations of the loop execute in parallel in a software pipe-
line that is eight iterations deep, with iterations n through n + 7 executing in par-
allel. Fixed-point software pipelines are rarely deeper than the one created by
this single-cycle loop. As loop sizes grow, the number of iterations that can
execute in parallel tends to become fewer.