110
Use 3DNow!™ Instructions for Fast Square Root and
AMD Athlon™ Processor x86 Code Optimization
22007E/0—November 1999
Use 3DNow!™ Instructions for Fast Square Root and
Reciprocal Square Root
3DNow! instructions can be used to compute a very fast, highly
accurate square root and reciprocal square root.
Optimized 15-Bit Precision Square Root
This square root operation can be executed in only 7 cycles,
assuming a program hides the latency of the first MOVD
instruction within previous code. The reciprocal square root
operation requires four less cycles than the square root
operation.
Example:
MOVD
MM0, [MEM]
; 0 | a
PFRSQRT
MM1, MM0
;1/sqrt(a) | 1/sqrt(a) (approximate)
PUNPCKLDQ MM0, MM0
; a | a (MMX instr.)
PFMUL
MM0, MM1
; sqrt(a) | sqrt(a)
Optimized 24-Bit Precision Square Root
This square root operation can be executed in only 19 cycles,
assuming a program hides the latency of the first MOVD
instruction within previous code. The reciprocal square root
operation requires four less cycles than the square root
operation.
Example:
MOVD
MM0, [MEM]
; 0 | a
PFRSQRT
MM1, MM0
; 1/sqrt(a) | 1/sqrt(a) (approx.)
MOVQ
MM2, MM1
; X_0 = 1/(sqrt a) (approx.)
PFMUL
MM1, MM1
;
X_0 * X_0 | X_0 * X_0
(step 1)
PUNPCKLDQ MM0, MM0
; a | a (MMX instr)
PFRSQIT1
MM1, MM0
; (intermediate) (step 2)
PFRCPIT2
MM1, MM2
; 1/sqrt(a) | 1/sqrt(a) (step 3)
PFMUL
MM0, MM1
; sqrt(a) | sqrt(a)
Summary of Contents for Athlon Processor x86
Page 1: ...AMD Athlon Processor x86 Code Optimization Guide TM...
Page 12: ...xii List of Figures AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 16: ...xvi Revision History AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 202: ...186 Page Attribute Table PAT AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 252: ...236 VectorPath Instructions AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 256: ...240 Index AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...