Packed-Data Processing on the ’C64x
8-34
Example 8–12 still performs the complex multiply as a series of discrete steps
once the individual elements are loaded. The next optimization step is to com-
bine some of the multiplies and adds/subtracts into _dotp and _dotpn intrinsics
in a similar manner to the Dot Product example presented earlier.
The real component of each result is calculated by taking the difference be-
tween the product of the real components of both input and the imaginary com-
ponents of both inputs. Because the real and imaginary components for each
input array are laid out the same, the _dotpn intrinsic can be used to calculate
the real component of the output. Figure 8–19 shows how this flow would work.
Figure 8–19. The _dotpn2 Intrinsic Performing Real Portion of Complex Multiply.
a_real
a_imaginary
a
b
b_real
b_imaginary
*
*
32–bit register
32–bit register
a_real * b_real
a_imaginary * b_imaginary
16 bit
16 bit
32 bit
32 bit
sub
a_real * b_real – a_imag * b_imag
c
c = _dotpn2(b, a)
32 bit
The calculation for the result’s imaginary component provides a different prob-
lem. As with the real component, the result is calculated from two products that
are added together. A problem arises, though, because it is necessary to multi-
ply the real component of one input with the imaginary component of the other
input, and vice versa. None of the ’C64x intrinsics provide that operation direct-
ly given the way the data is currently packed.