Refining C/C++ Code
3-24
3.4.2
Using Word Access for Short Data
The ’C6000 has instructions with corresponding intrinsics, such as _add2( ),
_mpyhl( ), _mpylh( ), that operate on 16-bit data stored in the high and low
parts of a 32-bit register. When operating on a stream of short data, you can
use word (int) accesses to read two short values at a time, and then use ’C6x
intrinsics to operate on the data. For example, rewriting the vecsum( ) function
to use word accesses (as in Example 3–8) doubles the performance of the
loop. See section 6.4,
Loading Two Data Values with LDW, on page 6-19 for
more information. This type of optimization is called packed data processing.
Example 3–8. Vector Sum With restrict Keywords, MUST_ITERATE Pragma, Word Reads
void vecsum4(short *restrict sum, const short *restrict in1,
const short *restrictin2, unsigned int N)
{
int i;
const int *restrict i_in1 = (const int *)in1;
const int *restrict i_in2 = (const int *)in2;
int *restrict i_sum = (int *)sum;
#pragma MUST_ITERATE (10);
for (i = 0; i < (N/2); i++)
i_sum[i] = _add2(i_in1[i], i_in2[i]);
}
Note:
The MUST_ITERATE intrinsic tells the compiler that the following loop will
iterate at least the specified number of times.
This transformation assumes that the pointers sum, in1, and in2 can be cast
to int *, which means that they must point to word-aligned data. By default, the
compiler aligns all short arrays on doubleword boundaries; however, a call like
the following creates an illegal memory access:
short a[51], b[50], c[50]; vecsum4(&a[1], b, c, 50);
On the ’C64x, nonaligned accesses to memory are allowed in C through the
_mem4 and _memd8 intrinsics.