112
3DNow!™ and MMX™ Intra-Operand Swapping
AMD Athlon™ Processor x86 Code Optimization
22007E/0—November 1999
Example:
PXOR MM2, MM2 ; 0 | 0
MOVD MM0, [ab] ; 0 0 | b a
MOVD MM1, [cd] ; 0 0 | d c
PUNPCKLWD MM0, MM2 ; 0 b | 0 a
PUNCPKLWD MM1, MM2 ; 0 d | 0 c
PMADDWD MM0, MM1 ; b*d | a*c
3DNow!™ and MMX™ Intra-Operand Swapping
AMD Athlon™
Specific Code
If the swapping of MMX register halves is necessary, use the
PSWAPD instruction, which is a new AMD Athlon 3DNow! DSP
e x t e n s i o n . U s e o f t h i s i n s t r u c t i o n s h o u l d o n ly b e fo r
AMD Athlon specific code. “PSWAPD MMreg1, MMreg2”
performs the following operation:
mmreg1[63:32] = mmreg2[31:0])
mmreg1[31:0] = mmreg2[63:32])
See the AMD Extensions to the 3DNow! and MMX Instruction Set
Manual, order #22466 for more usage information.
Blended Code
Otherwise, for blended code, which needs to run well on
AMD-K6 and AMD Athlon family processors, the following code
is recommended:
Example 1 (Preferred, faster):
;MM1 = SWAP (MM0), MM0 destroyed
MOVQ
MM1, MM0
;make a copy
PUNPCKLDQ
MM0, MM0
;duplicate lower half
PUNPCKHDQ
MM1, MM0
;combine lower halves
Example 2 (Preferred, fast):
;MM1 = SWAP (MM0), MM0 preserved
MOVQ
MM1, MM0
;make a copy
PUNPCKHDQ
MM1, MM1
;duplicate upper half
PUNPCKLDQ
MM1, MM0
;combine upper halves
Both examples accomplish the swapping, but the first example
should be used if the original contents of the register do not
need to be preserved. The first example is faster due to the fact
that the MOVQ and PUNPCKLDQ instructions can execute in
parallel. The instructions in the second example are dependent
on one another and take longer to execute.
Summary of Contents for Athlon Processor x86
Page 1: ...AMD Athlon Processor x86 Code Optimization Guide TM...
Page 12: ...xii List of Figures AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 16: ...xvi Revision History AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 202: ...186 Page Attribute Table PAT AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 252: ...236 VectorPath Instructions AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 256: ...240 Index AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...