Refining C/C++ Code
3-34
3.4.2.4
Using _nassert(), Word Accesses, and the MUST_ITERATE pragma
It is possible for the compiler to automatically perform packed data optimiza-
tions for some, but not all loops. By either using global arrays, or by using the
_nassert() intrinsic to provide alignment information about your pointers, the
compiler can transform your code to use word accesses and the ‘C6000 intrin-
sics.
Example 3–18 shows how the compiler can automatically do this optimization.
Example 3–18. Using the Compiler to Generate a Dot Product With Word Accesses
int dotprod1(const short *restrict a, const short *restrict b, unsigned int N)
{
int i, sum = 0;
/* a and b are aligned to a word boundary */
_nassert(((int)(a) & 0x3) == 0);
_nassert(((int)(b) & 0x3) == 0);
#pragma MUST_ITERATE (40, 40);
for (i = 0; i < N; i++)
sum += a[i] * b[i];
return sum;
}
Compile Example 3–18 with the following options: –o -k. Open up the assem-
bly file and look at the loop kernel. The results are the exact same as those
produced by Example 3–11. The first 2 _nassert() intrinsics in Example 3–18
tell the compiler that the arrays pointed to by a and b are aligned to a word
boundary, so it is safe for the compiler to use a LDW instruction to load two
short values. The compiler generates the _mpy() and _mpyh() intrinsics inter-
nally as well as the two sums that were used in Example 3–11 (shown again
below).
int dotprod(const short *restrict a, const short *re-
strict b,
unsigned int N)
{
int i, sum1 = 0, sum2 = 0;
const int *restrict i_a = (const int *)a;
const int *restrict i_b = (const int *)b;
for (i = 0; i < (N >> 1); i++) {
sum1 = sum1 + _mpy (i_a[i], i_b[i]);
sum2 = sum2 + _mpyh (i_a[i], i_b[i]);
}
return sum1 + sum2;
}