Modulo Scheduling of Multicycle Loops
6-74
6.6.8
Final Assembly
Example 6–40 shows the final assembly code for the weighted vector sum.
The following optimizations are included:
-
While iteration n of instruction STH ci+1 is executing, iteration n + 1 of
STH ci is executing. To prevent the STH ci instruction from executing itera-
tion 51 while STH ci + 1 executes iteration 50, execute the loop only 49
times and schedule the final executions of ADD ci+1 and STH ci+1 after
exiting the loop.
-
The mask for the AND instruction is created with MVK and MVKH in paral-
lel with the loop prolog.
-
The pointer to the odd elements in array c is also set up in parallel with the
loop prolog.