Modulo Scheduling of Multicycle Loops
6-58
6.6
Modulo Scheduling of Multicycle Loops
Section 6.5 demonstrated the modulo-scheduling technique for the dot
product code. In that example of a single-cycle loop, none of the instructions
used the same resources. Multicycle loops can present resource conflicts
which affect modulo scheduling. This section describes techniques to deal
with this issue.
6.6.1
Weighted Vector Sum C Code
Example 6–34 shows the C code for a weighted vector sum.
Example 6–34. Weighted Vector Sum C Code
void w_vec(short a[],short b[],short c[],short m)
{
int i;
for (i=0; i<100; i++) {
c[i] = ((m * a[i]) >> 15) + b[i];
}
}
6.6.2
Translating C Code to Linear Assembly
Example 6–35 shows the linear assembly that executes the weighted vector
sum in Example 6–34. This linear assembly does not have functional units as-
signed. The dependency graph will help in those decisions. However, before
looking at the dependency graph, the code can be optimized further.
Example 6–35. Linear Assembly for Weighted Vector Sum Inner Loop
LDH
*aptr++,ai
; ai
LDH
*bptr++,bi
; bi
MPY
m,ai,pi
; m * ai
SHR
pi,15,pi_scaled
; (m * ai) >> 15
ADD
pi_scaled,bi,ci
; ci = (m * ai) >> 15 + bi
STH
ci,*cptr++
; store ci
[cntr]SUB
cntr,1,cntr
; decrement loop counter
[cntr]B
LOOP
; branch to loop