Packed-Data Processing on the ’C64x
8-32
8.2.7.2
Combining Operations in the Vector Complex Multiply Kernel
The Vector Complex Multiply kernel that was originally shown in Example 8–4
can be optimized with a technique similar to the one that used with the Dot
Product kernel in Section 8.2.4.1. First, the loads and stores are vectorized in
order to bring data in more efficiently. Next, operations are combined together
into intrinsics to make full use of the machine.
Example 8–12 illustrates the vectorization step. For details, consult the earlier
examples, such as the Vector Sum. The complex multiplication step itself has
not yet been optimized at all.