Modulo Scheduling of Multicycle Loops
6-71
Optimizing Assembly Code via Linear Assembly
Table 6–15 shows the following additions:
-
B LOOP (.S1, cycle 6)
-
SUB cntr (.L1, cycle 5)
-
ADD ci+1 (.L2, cycle 10)
-
STH ci (cycle 9)
-
STH ci+1 (cycle 11)
To avoid resource conflicts and live-too-long problems, Table 6–15 also
includes the following additional changes:
-
LDW bi_i+1 (.D2) moved from cycle 0 to cycle 2.
-
AND bi (.L2) moved from cycle 6 to cycle 7.
-
SHR pi+1_scaled (.S2) moved from cycle 7 to cycle 9.
-
MPYHL pi+1 moved from cycle 5 to cycle 6.
-
SHR bi+1 moved from cycle 6 to 8.
From the table, you can see that this loop is pipelined six iterations deep, be-
cause iterations n and n + 5 execute in parallel.