Use the 3DNow!™ PREFETCH and PREFETCHW Instructions
47
22007E/0—November 1999
AMD Athlon™ Processor x86 Code Optimization
PREFETCH/W versus
PREFETCHNTA/T0/T1
/T2
The PREFETCHNTA/T0/T1/T2 instructions in the MMX
extensions are processor implementation dependent. To
maintain compatibility with the 25 million AMD-K6
®
-2 and
A M D -K 6 -I I I p ro c e s s o rs a l re a dy s o l d , u s e t h e 3 D N ow !
PREFETCH/W instructions instead of the various prefetch
flavors in the new MMX extensions.
PREFETCHW Usage
Code that intends to modify the cache line brought in through
prefetching should use the PREFETCHW instruction. While
P R E F E T CH W wo rks t he s a m e a s a P R EF E T C H o n t he
AMD-K6-2 and AMD-K6-III processors, PREFETCHW gives a
hint to the AMD Athlon processor of an intent to modify the
cache line. The AMD Athlon processor will mark the cache line
b e i n g b ro u g h t i n by P R E F ET CH W a s M o d i f ie d . U si n g
PREFETCHW can save an additional 15-25 cycles compared to
a PREFETCH and the subsequent cache state change caused by
a write to the prefetched cache line.
Multiple Prefetches
Programmers can initiate multiple outstanding prefetches on
t h e A M D A t h l o n p ro c e s s o r. Wh i l e t h e A M D -K 6 -2 a n d
AMD-K6-III processors can have only one outstanding prefetch,
the AMD Athlon processor can have up to six outstanding
prefetches. When all six buffers are filled by various memory
read requests, the processor will simply ignore any new
prefetch requests until a buffer frees up. Multiple prefetch
requests are essentially handled in-order. If data is needed first,
then that data should be prefetched first.
The example below shows how to initiate multiple prefetches
when traversing more than one array.
Example (Multiple Prefetches):
.CODE
.K3D
; original C code
;
; #define LARGE_NUM 65536
;
; double array_a[LARGE_NUM];
; double array b[LARGE_NUM];
; double array c[LARGE_NUM];
; int i;
;
; for (i = 0; i < LARGE_NUM; i++) {
; a[i] = b[i] * c[i]
; }
Содержание Athlon Processor x86
Страница 1: ...AMD Athlon Processor x86 Code Optimization Guide TM...
Страница 12: ...xii List of Figures AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 16: ...xvi Revision History AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 60: ...44 Code Padding Using Neutral Code Fillers AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 92: ...76 Push Memory Data Carefully AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 122: ...106 Take Advantage of the FSINCOS Instruction AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 156: ...140 AMD Athlon Processor Microarchitecture AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 176: ...160 Write Combining Operations AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 202: ...186 Page Attribute Table PAT AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 252: ...236 VectorPath Instructions AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 256: ...240 Index AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...