Packed-Data Processing on the ’C64x
8-28
8.2.7
Combining Multiple Operations in a Single Instruction
The Dot Product and Vector Complex Multiply examples that were presented
earlier were both examples of kernels that benefit from
macro operations, that
is, instructions which perform more than a simple operation.
The ’C64x provides a number of instructions which combine common opera-
tions together. These instructions reduce the overall instruction count in the
code, thereby reducing codesize and increasing code density. They also tend
to simplify programming. Some of the more commonly used macro operations
are listed in Table 8–5.
Table 8–5. Intrinsics Which Combine Multiple Operations in one Instruction
Intrinsic
Instruction
Operations combined
_dotp2
DOTP2
Performs two 16x16 multiplies and adds the products
together.
_dotpn2
DOTPN2
Performs two 16x16 multiplies and subtracts the sec-
ond product from the first.
_dotprsu2
DOTPRSU2
Performs two 16x16 multiplies, adds products togeth-
er, and shifts/rounds the sum.
_dotpnrsu2
DOTPNRSU2
Performs two 16x16 multiplies, subtracts the 2nd
product from the 1st, and shifts/rounds the difference.
_dotpu4
_dotpsu4
DOTPU4
DOTPSU4
Performs four 8x8 multiplies and adds products to-
gether.
_max2
_min2
MAX2
MIN2
Compares two pairs of numbers, and selects the
larger/smaller in each pair.
_maxu4
_minu4
MAXU4
MINU4
Compares four pairs of numbers, and selects the
larger/smaller in each pair.
_avg2
AVG2
Performs two 16-bit additions, followed by a right shift
by 1 with round.
_avgu4
AVGU4
Performs four 8-bit additions, followed a right shift by
1 with round.
_subabs4
SUBABS4
Finds the absolute value of the between four pairs of
8-bit numbers.
As you can see, these macro operations can replace a number of separate in-
structions rather easily. For instance, each _dotp2 eliminates an add, and each