Packed-Data Processing on the ’C64x
8-27
’C64x Programming Considerations
Figure 8–17. Fine Tuning Vector Multiply (shift < 16)
Signed 32 bit product
Right shifts
Signed 32 bit product
c[1]
c[0]
_pack2
Original data flow
Modified data flow
Right shifts
c[1]
c[0]
16-bit result
16-bit result
Truncated bits
Signed 32 bit product
Signed 32 bit product
_pack2
16 bits
16 bits
Discarded
sign bits
Discarded
sign bits
16 bits
16 bits
16-bit result
16-bit result
Whether or not the 16-bit shift version is used, consider the vector multiply to
be fully optimized from a packed data processing standpoint. It can be further
optimized using the more general techniques such as loop-unrolling and soft-
ware pipelining that are discussed in Chapter 6.