Lesson 3: Packed Data Optimization of Memory Bandwidth
2-20
Example 2–11. lesson3_c.c
#define WORD_ALIGNED(x) (_nassert(((int)(x) & 0x3) == 0))
void lesson3_c(short * restrict xptr, short * restrict yptr, short *zptr,
short *w_sum, int N)
{
int i, w_vec1, w_vec2;
short w1,w2;
WORD_ALIGNED(xptr);
WORD_ALIGNED(yptr);
w1 = zptr[0];
w2 = zptr[1];
#pragma MUST_ITERATE(20, , 2);
for (i = 0; i < N; i++)
{
w_vec1 = xptr[i] * w1;
w_vec2 = yptr[i] * w2;
w_sum[i] = (w_vec2) >> 15;
}
}
By asserting that xptr and yptr addresses ”anded” with 0x3 are equal to zero,
the compiler knows that they are word aligned. This means the compiler can
perform LDW and packed data optimization on these memory accesses.
Open lesson3_c.asm