Packed-Data Processing on the ’C64x
8-36
Example 8–13. Vectorized form of the Vector Complex Multiply
void vec_cx_mpy(const short *restrict a, const short *restrict b,
short *restrict c, int len, int shift)
{
int i;
unsigned a3_a2, a1_a0; /* Packed 16–bit values */
unsigned b3_b2, b1_b0; /* Packed 16–bit values */
int c3,c2, c1,c0; /* Separate 32–bit results */
unsigned c3_c2, c1_c0; /* Packed 16–bit values */
for (i = 0; i < len; i += 4)
{
/* Load two complex numbers from the a[] array. */
/* The complex values loaded are represented as ’a3 + a2 * j’ */
/* and ’a1 + a0 * j’. That is, the real components are a3 */
/* and a1, and the imaginary components are a2 and a0. */
a3_a2 = _hi(*(const double *) &a[i]);
a1_a0 = _lo(*(const double *) &a[i]);
/* Load two complex numbers from the b[] array. */
b3_b2 = _hi(*(const double *) &b[i]);
b1_b0 = _lo(*(const double *) &b[i]);
/* Perform the complex multiplies using _dotp2/_dotpn2. */
c3 = _dotpn2(b3_b2, a3_a2); /* Real */
c2 = _dotp2 (b3_b2, _packlh2(a3_a2, a3_a2)); /* Imaginary */
c1 = _dotpn2(b1_b0, a1_a0); /* Real */
c0 = _dotp2 (b1_b0, _packlh2(a1_a0, a1_a0)); /* Imaginary */
/* Pack the 16–bit results from the upper halves of the */
/* 32–bit results into 32–bit words. */
c3_c2 = _packh2(c3, c2);
c1_c0 = _packh2(c1, c0);
/* Store the results. */
*(double *) &c[i] = _itod(c3_c2, c1_c0);
}
}
As with the earlier examples, this kernel now takes full advantage of the
packed data processing features that the ’C64x provides. More general opti-
mizations can be performed as described in Chapter 6 to further optimize this
code.