Interruptible Code Generation
7-10
Example 7–5. Dot Product With MUST_ITERATE Pragma Guaranteeing Trip Count Range
and Factor of 2
int dot_prod(short *a, short *b, int n)
{
int i, sum = 0;
#pragma MUST_ITERATE (20,50,2);
for (i = 0; i < n; i++)
sum += a[i] * b[i];
return sum;
}
By enabling unrolling, performance has doubled from one result per 6-cycle
kernel to two results per 6-cycle kernel. By allowing the compiler to maximize
unrolling when using the interrupt threshold of one, you can get most of the
performance back. Example 7–6 shows a dot product loop that will execute a
factor of 4 between 16 and 48 times.
Example 7–6. Dot Product With MUST_ITERATE Pragma Guaranteeing Trip Count Range
and Factor of 4
int dot_prod(short *a, short *b, int n)
{
int i, sum = 0;
#pragma MUST_ITERATE (16,48,4);
for (i = 0; i < n; i++)
sum += a[i] * b[i];
return sum;
}
The compiler knows that the trip count is some factor of four. The compiler will
unroll this loop such that four iterations of the loop (four results are calculated)
occur during the six cycle loop kernel. This is an improvement of four times
over the first attempt at building the code with an interrupt threshold of one. The
one drawback of unrolling the code is that code size increases, so using this
type of optimization should only be done on key loops.