Stream of Packed Unsigned Bytes
125
22007E/0—November 1999
AMD Athlon™ Processor x86 Code Optimization
The following code fragment uses the 3DNow! PAVGUSB
i n s tr u c t i o n t o p e r fo r m ave ra g i n g b e t we e n t h e s o u rc e
macroblock and destination macroblock:
Example 2 (Preferred):
MOV
EAX, DWORD PTR Src_MB
MOV
EDI, DWORD PTR Dst_MB
MOV
EDX, DWORD PTR SrcStride
MOV
EBX, DWORD PTR DstStride
MOV
ECX, 16
L1:
MOVQ
MM0, [EAX]
;MM0=QWORD1
MOVQ
MM1, [EAX+8]
;MM1=QWORD2
PAVGUSB
MM0, [EDI]
;( QWORD3)/2 with
; adjustment
PAVGUSB
MM1, [EDI+8]
;( QWORD4)/2 with
; adjustment
ADD
EAX, EDX
MOVQ
[EDI], MM0
MOVQ
[EDI+8], MM1
ADD
EDI, EBX
LOOP
L1
Stream of Packed Unsigned Bytes
The following code is an example of how to process a stream of
packed unsigned bytes (like RGBA information) with faster
3DNow! instructions.
Example:
outside loop:
PXOR
MM0, MM0
inside loop:
MOVD
MM1, [VAR]
;
0 | v[3],v[2],v[1],v[0]
PUNPCKLBW
MM1, MM0
;0,v[3],0,v[2] | 0,v[1],0,v[0]
MOVQ
MM2, MM1
;0,v[3],0,v[2] | 0,v[1],0,v[0]
PUNPCKLWD
MM1, MM0
; 0,0,0,v[1] | 0,0,0,v[0]
PUNPCKHWD
MM2, MM0
; 0,0,0,v[3] | 0,0,0,v[2]
PI2FD
MM1, MM1
; float(v[1]) | float(v[0])
PI2FD
MM2, MM2
; float(v[3]) | float(v[2])
Summary of Contents for Athlon Processor x86
Page 1: ...AMD Athlon Processor x86 Code Optimization Guide TM...
Page 12: ...xii List of Figures AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 16: ...xvi Revision History AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 202: ...186 Page Attribute Table PAT AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 252: ...236 VectorPath Instructions AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 256: ...240 Index AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...