Packed-Data Processing on the ’C64x
8-20
Figure 8–12. Array Access in Vector Sum by STDW
c_hi
c[3]
c[2]
c[2]
c[3]
c[0]
c[1]
c[0]
c[1]
c_lo
32 bits
32 bits
_itod()
intrinsic
c[0]
c[1]
c[2]
c[3]
c[7]
c[4]
c[5]
c[6]
. . .
64 bits
16 bits
This code now efficiently reads and writes large amounts of data. The next step
is to find a method to quickly add them. The _add2() intrinsic provides just that:
It adds corresponding packed elements in two different words, producing two
packed sums. It provides exactly what is needed, a vector addition.
Figure 8–13 illustrates.
Figure 8–13. Vector Addition
a[1]
a[0]
+
+
a_lo
c_lo = _add2(b_lo, a_lo);
b_lo
b[1]
b[0]
c_lo
c[1] = b[1] + a[1]
c[0] = b[0] + a[0]
So, putting in _add2() to perform the additions provides the complete code
shown in Example 8–6.