viii
Contents
AMD Athlon™ Processor x86 Code Optimization
22007E/0—November 1999
Fast Conversion of Signed Words to Floating-Point . . . . . . . . . . . . 113
Use MMX PXOR to Negate 3DNow! Data . . . . . . . . . . . . . . . . . . . . 113
Use MMX PCMP Instead of 3DNow! PFCMP. . . . . . . . . . . . . . . . . . 114
Use MMX Instructions for Block Copies and Block Fills . . . . . . . . 115
Use MMX PXOR to Clear All Bits in an MMX Register . . . . . . . . . 118
Use MMX PCMPEQD to Set All Bits in an MMX Register . . . . . . . 119
Use MMX PAND to Find Absolute Value in 3DNow! Code . . . . . . 119
Optimized Matrix Multiplication. . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Efficient 3D-Clipping Code Computation Using
3DNow! Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Use 3DNow! PAVGUSB for MPEG-2 Motion Compensation . . . . . 123
Stream of Packed Unsigned Bytes . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Complex Number Arithmetic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
11
General x86 Optimization Guidelines
127
Short Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Register Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Stack Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Appendix A AMD Athlon™ Processor Microarchitecture
129
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
AMD Athlon Processor Microarchitecture . . . . . . . . . . . . . . . . . . . . 130
Superscalar Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Instruction Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Predecode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Branch Prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Early Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Instruction Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Data Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Integer Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Summary of Contents for Athlon Processor x86
Page 1: ...AMD Athlon Processor x86 Code Optimization Guide TM...
Page 12: ...xii List of Figures AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 16: ...xvi Revision History AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 202: ...186 Page Attribute Table PAT AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 252: ...236 VectorPath Instructions AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 256: ...240 Index AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...