Refining C/C++ Code
3-46
Software pipelining is performed by the compiler only on inner loops; there-
fore, you can increase performance by creating larger inner loops. One meth-
od for creating large inner loops is to completely unroll inner loops that execute
for a small number of cycles.
In Example 3–29, the compiler pipelines the inner loop with a kernel size of one
cycle; therefore, the inner loop completes a result every cycle. However, the
overhead of filling and draining the software pipeline can be significant, and
other outer-loop code is not software pipelined.
Example 3–29. FIR_Type2— Original Form
void fir2(const short input[restrict], const short coefs[restrict], short
out[restrict])
{
int i, j;
int sum = 0;
for (i = 0; i < 40; i++)
{
for (j = 0; j < 16; j++)
sum += coefs[j] * input[i + 15 – j];
out[i] = (sum >> 15);
}
}
For loops with a simple loop structure, the compiler uses a heuristic to deter-
mine if it should unroll the loop. Because unrolling can increase code size, in
some cases the compiler does not unroll the loop. If you have identified this
loop as being critical to your application, then unroll the inner loop in C code,
as in Example 3–30.
In general unrolling may be a good idea if you have an uneven partition or if
your loop carried dependency bound is greater than the partition bound. (Refer
to section 6.7,
Loop Carry Paths and section 3.2 in the TMS320C6000 Opti-
mizing C/C++ Compiler User’s Guide. This information can be obtained by us-
ing the –mw option and looking at the comment block before the loop.