Outer Loop Conditionally Executed With Inner Loop
6.14 Outer Loop Conditionally Executed With Inner Loop
Software pipelining the outer loop improved the outer loop overhead in the
previous example from 16 cycles to 8 cycles. Executing the outer loop condi-
tionally and in parallel with the inner loop eliminates the overhead entirely.
6.14.1 Unrolled FIR Filter C Code
Example 6–72 shows the same unrolled FIR filter C code that used in the
previous example.
Example 6–72. Unrolled FIR Filter C Code
void fir(short x[], short h[], short y[])
int i, j, sum0, sum1;
short x0,x1,x2,x3,h0,h1,h2,h3;
for (j = 0; j < 100; j+=2) {
sum0 = 0;
sum1 = 0;
x0 = x[j];
for (i = 0; i < 32; i+=4){
x1 = x[j+i+1];
h0 = h[i];
sum0 += x0 * h0;
sum1 += x1 * h0;
x2 = x[j+i+2];
h1 = h[i+1];
sum0 += x1 * h1;
sum1 += x2 * h1;
x3 = x[j+i+3];
h2 = h[i+2];
sum0 += x2 * h2;
sum1 += x3 * h2;
x0 = x[j+i+4];
h3 = h[i+3];
sum0 += x3 * h3;
sum1 += x0 * h3;
y[j] = sum0 >> 15;
y[j+1] = sum1 >> 15;