Optimizing for SIMD Integer Applications
4
4-33
Note that the output is a packed doubleword. If needed, a pack
instruction can be used to convert the result to 16-bit (thereby matching
the format of the input).
Packed 32*32 Multiply
The
PMULUDQ
instruction performs an unsigned multiply on the lower
pair of double-word operands within each 64-bit chunk from the two
sources; the full 64-bit result from each multiplication is returned to the
destination register. This instruction is added in both a 64-bit and
128-bit version; the latter performs 2 independent operations, on the low
and high halves of a 128-bit register.
Packed 64-bit Add/Subtract
The
PADDQ
/
PSUBQ
instructions add/subtract quad-word operands within
each 64-bit chunk from the two sources; the 64-bit result from each
computation is written to the destination register. Like the integer
ADD
/
SUB
instruction,
PADDQ
/
PSUBQ
can operate on either unsigned or
signed (two’s complement notation) integer operands. When an
individual result is too large to be represented in 64-bits, the lower
64-bits of the result are written to the destination operand and therefore
the result wraps around. These instructions are added in both a 64-bit
and 128-bit version; the latter performs 2 independent operations, on the
low and high halves of a 128-bit register.
128-bit Shifts
The
pslldq
/
psrldq
instructions shift the first operand to the left/right
by the amount of bytes specified by the immediate operand. The empty
low/high-order bytes are cleared (set to zero). If the value specified by
the immediate operand is greater than 15, then the destination is set to
all zeros.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...