Refining C/C++ Code
3-36
Example 3–20. Using _nassert() Intrinsic to Generate Word Accesses for FIR Filter
void fir (const short x[restrict], const short h[restrict], short y[restrict]
int n, int m, int s)
{
int i, j;
long y0;
long round = 1L << (s - 1);
_nassert(((int)x & 0x3) == 0);
_nassert(((int)h & 0x3) == 0);
_nassert(((int)y & 0x3) == 0);
for (j = 0; j < m; j++)
{
y0 = round;
#pragma MUST_ITERATE (40, 40);
for (i = 0; i < n; i++)
y0 += x[i + j] * h[i];
y[j] = (int)(y0 >> s);
}
}
As you can see from Example 3–20, the optimization done by the compiler is
not as optimal as the code produced in Example 3–13, but it is more optimal
than the code in Example 3–12.
Example 3–21. Compiler Output From Example 3–20
L3: ; PIPED LOOP KERNEL
[!B0] ADD .L1 A9,A7:A6,A7:A6 ; |21|
|| MPY .M2X A3,B3,B2 ; |21|
|| MPYHL .M1X B3,A0,A0 ; |21|
|| [ A1] B .S2 L3 ; @|21|
|| LDH .D2T2 *++B9(8),B3 ; @@|21|
|| LDH .D1T1 *+A8(4),A3 ; @@|21|
[!B0] ADD .L2 B3,B5:B4,B5:B4 ; |21|
|| MPY .M1X A0,B1,A9 ; @|21|
|| LDW .D2T2 *+B8(4),B3 ; @@|21|
|| LDH .D1T1 *+A8(6),A0 ; @@|21|
[ B0] SUB .S2 B0,1,B0 ;
|| [!B0] ADD .L2 B2,B7:B6,B7:B6 ; |21|
|| [!B0] ADD .L1 A0,A5:A4,A5:A4 ; |21|
|| MPYHL .M2 B1,B3,B3 ; @|21|
|| [ A1] SUB .S1 A1,1,A1 ; @@|21|
|| LDW .D2T2 *++B8(8),B1 ; @@@|21|
|| LDH .D1T1 *++A8(8),A0 ; @@@|21|