Use MOVZX and MOVSX
73
22007E/0—November 1999
AMD Athlon™ Processor x86 Code Optimization
Example 1 (Avoid):
ADD EBX, ECX
;inst 1
MOV EAX, DWORD PTR [10h] ;inst 2 (fast address calc.)
MOV ECX, DWORD PTR [EAX+EBX] ;inst 3 (slow address calc.)
MOV EDX, DWORD PTR [24h] ;this load is stalled from
; accessing data cache due
; to long latency for
; generating address for
; inst 3
Example 2 (Preferred):
ADD EBX, ECX
;inst 1
MOV EAX, DWORD PTR [10h] ;inst 2
MOV EDX, DWORD PTR [24h] ;place load above inst 3
; to avoid address
; generation interlock stall
MOV ECX, DWORD PTR [EAX+EBX] ;inst 3
Use MOVZX and MOVSX
Use the MOVZX and MOVSX instructions to zero-extend and
sign-extend byte-size and word-size operands to doubleword
length. For example, typical code for zero extension creates a
superset dependency when the zero-extended value is used, as
in the following code:
Example 1 (Avoid):
XOR
EAX, EAX
MOV
AL, [MEM]
Example 2 (Preferred):
MOVZX
EAX, BYTE PTR [MEM]
Minimize Pointer Arithmetic in Loops
Minimize pointer arithmetic in loops, especially if the loop
body is small. In this case, the pointer arithmetic would cause
significant overhead. Instead, take advantage of the complex
addressing modes to utilize the loop counter to index into
memory arrays. Using complex addressing modes does not have
any negative impact on execution speed, but the reduced
number of instructions preserves decode bandwidth.
Summary of Contents for Athlon Processor x86
Page 1: ...AMD Athlon Processor x86 Code Optimization Guide TM...
Page 12: ...xii List of Figures AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 16: ...xvi Revision History AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 202: ...186 Page Attribute Table PAT AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 252: ...236 VectorPath Instructions AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 256: ...240 Index AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...