
Use MMX™ PMADDWD Instruction to Perform Two 32-Bit Multiplies in Parallel
111
22007E/0—November 1999
AMD Athlon™ Processor x86 Code Optimization
Newton-Raphson Reciprocal Square Root
The general Newton-Raphson reciprocal square root recurrence
is:
Z
i+1
= 1/2
•
Z
i
•
(3 – b
•
Z
i
2
)
To reduce the number of iterations, the initial approximation
rea d from a table. The 3D Now ! reciprocal square root
approximation is accurate to at least 15 bits. Accordingly, to
obtain a single-precision 24-bit reciprocal square root of an
input operand b, one Newton-Raphson iteration is required,
using the following sequence of 3DNow! instructions:
X
0
= PFRSQRT(b)
X
1
= PFMUL(X
0
,X
0
)
X
2
= PFRSQIT1(b,X
1
)
X
3
= PFRCPIT2(X
2
,X
0
)
X
4
= PFMUL(b,X
3
)
The 24-bit final reciprocal square root value is X
3
. In the
AMD Athlon processor 3DNow! implementation, the estimate
contains the correct round-to-nearest value for approximately
87% of all arguments. The remaining arguments differ from the
correct round-to-nearest value by one unit-in-the-last-place. The
square root (X
4
) is formed in the last step by multiplying by the
input operand b.
Use MMX™ PMADDWD Instruction to Perform Two 32-Bit
Multiplies in Parallel
The MMX PMADDWD instruction can be used to perform two
signed 16x16
→
32 bit multiplies in parallel, with much higher
performance than can be achieved using the IMUL instruction.
The PMADDWD instruction is designed to perform four
16x16
→
32 bit signed multiplies and accumulate the results
pairwise. By making one of the results in a pair a zero, there are
now just two multiplies. The following example shows how to
multiply 16-bit signed numbers a,b,c,d into signed 32-bit
products a
×
c and b
×
d:
Содержание Athlon Processor x86
Страница 1: ...AMD Athlon Processor x86 Code Optimization Guide TM...
Страница 12: ...xii List of Figures AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 16: ...xvi Revision History AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 60: ...44 Code Padding Using Neutral Code Fillers AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 92: ...76 Push Memory Data Carefully AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 122: ...106 Take Advantage of the FSINCOS Instruction AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 156: ...140 AMD Athlon Processor Microarchitecture AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 176: ...160 Write Combining Operations AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 202: ...186 Page Attribute Table PAT AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 252: ...236 VectorPath Instructions AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 256: ...240 Index AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...