Packed-Data Processing on the ’C64x
8-19
’C64x Programming Considerations
Example 8–5. Vectorization: Using LDDW and STDW in Vector Sum
v
oid vec_sum(const short *restrict a, const short *restrict b,
short *restrict c, int len)
{
int i;
unsigned a_hi, a_lo;
unsigned b_hi, b_lo;
unsigned c_hi, c_lo;
for (i = 0; i < len; i += 4)
{
a_hi = _hi(*(const double *) &a[i]);
a_lo = _lo(*(const double *) &a[i]);
b_hi = _hi(*(const double *) &b[i]);
b_lo = _lo(*(const double *) &b[i]);
/* ...somehow, the ADD occurs here,
with results in c_hi, c_lo... */
*(double *) &c[i] = _itod(c_hi, c_lo);
}
}
Figure 8–11.Array Access in Vector Sum by LDDW
a[0]
16 bits
a[1]
a[2]
a[3]
a[7]
a[4]
a[5]
a[6]
. . .
64 bits
a[1]
a[0]
a[2]
a[3]
32 bits
32 bits
a[3]
a[2]
a[1]
a[0]
LDDW
64 bit
register pair
_lo() intrinsic
_hi() intrinsic
a_hi
a_lo