Packed-Data Processing on the ’C64x
8-17
’C64x Programming Considerations
Figure 8–10. Graphical Representation of a Single Iteration of Vector Complex Multiply.
Array element 2n+1
(real component)
sub
add
Array element 2n
(imaginary component)
Output c
multiply
multiply
multiply
multiply
Array element 2n+1
(real component)
Array element 2n
(imaginary component)
Array element 2n+1
(real component)
Array element 2n
(imaginary component)
Input B
Input A
The following sections revisit these basic kernels and illustrate how single in-
struction multiple data optimizations apply to each of these.
8.2.6
Vectorizing With Packed Data Processing
The most basic packed data optimization is to use wide memory accesses, in
other words, word and double-word loads and stores, to access narrow data
such as byte or half-word data. This is a simple form of vectorization, as de-
scribed above, applied only to the array accesses.
Widening memory accesses generally serves as a starting point for other vec-
tor and packed data operations. This is due to the fact that the wide memory
accesses tend to impose a packed data flow on the rest of the code around
them. This type of optimization is said to work from the outside in, as loads and