Use the 3DNow!™ PREFETCH and PREFETCHW Instructions
47
22007E/0—November 1999
AMD Athlon™ Processor x86 Code Optimization
PREFETCH/W versus
PREFETCHNTA/T0/T1
/T2
The PREFETCHNTA/T0/T1/T2 instructions in the MMX
extensions are processor implementation dependent. To
maintain compatibility with the 25 million AMD-K6
®
-2 and
A M D -K 6 -I I I p ro c e s s o rs a l re a dy s o l d , u s e t h e 3 D N ow !
PREFETCH/W instructions instead of the various prefetch
flavors in the new MMX extensions.
PREFETCHW Usage
Code that intends to modify the cache line brought in through
prefetching should use the PREFETCHW instruction. While
P R E F E T CH W wo rks t he s a m e a s a P R EF E T C H o n t he
AMD-K6-2 and AMD-K6-III processors, PREFETCHW gives a
hint to the AMD Athlon processor of an intent to modify the
cache line. The AMD Athlon processor will mark the cache line
b e i n g b ro u g h t i n by P R E F ET CH W a s M o d i f ie d . U si n g
PREFETCHW can save an additional 15-25 cycles compared to
a PREFETCH and the subsequent cache state change caused by
a write to the prefetched cache line.
Multiple Prefetches
Programmers can initiate multiple outstanding prefetches on
t h e A M D A t h l o n p ro c e s s o r. Wh i l e t h e A M D -K 6 -2 a n d
AMD-K6-III processors can have only one outstanding prefetch,
the AMD Athlon processor can have up to six outstanding
prefetches. When all six buffers are filled by various memory
read requests, the processor will simply ignore any new
prefetch requests until a buffer frees up. Multiple prefetch
requests are essentially handled in-order. If data is needed first,
then that data should be prefetched first.
The example below shows how to initiate multiple prefetches
when traversing more than one array.
Example (Multiple Prefetches):
.CODE
.K3D
; original C code
;
; #define LARGE_NUM 65536
;
; double array_a[LARGE_NUM];
; double array b[LARGE_NUM];
; double array c[LARGE_NUM];
; int i;
;
; for (i = 0; i < LARGE_NUM; i++) {
; a[i] = b[i] * c[i]
; }
Summary of Contents for Athlon Processor x86
Page 1: ...AMD Athlon Processor x86 Code Optimization Guide TM...
Page 12: ...xii List of Figures AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 16: ...xvi Revision History AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 202: ...186 Page Attribute Table PAT AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 252: ...236 VectorPath Instructions AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 256: ...240 Index AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...