Packed-Data Processing on the ’C64x
8-25
’C64x Programming Considerations
This code works, but it is heavily bottlenecked on shifts. One way to eliminate
this bottleneck is to use the packed 16-bit shift intrinsic, _shr2(). This can be
done without losing precision, under the following conditions:
-
If the shift amount is known to be greater than or equal to 16, use
_packh2() instead of _pack2() before the shift. If the shift amount is exactly
16, eliminate the shift. The _packh2 effectively performs part of the shift,
shifting right by 16, so that the job can be finished with a _shr2() intrinsic.
Figure 8–17 illustrates how this works.
-
If the shift amount is less than 16, only use the _shr2() intrinsic if the 32-bit
products can be safely truncated to 16 bits first without losing significant
digits. In this case, use the _pack2() intrinsic, but the bits above bit 15 are
lost in the product. This is safe only if those bits are redundant (sign bits).
Figure 8–17 illustrates this case.