IA-32 Intel® Architecture Optimization
6-36
Figure 6-7 shows how prefetch instructions and strip-mining can be
applied to increase performance in both of these scenarios.
For Pentium 4 processors, the left scenario shows a graphical
implementation of using
prefetchnta
to prefetch data into selected
ways of the second-level cache
only
(SM1 denotes strip mine one way
of second-level), minimizing second-level cache pollution. Use
prefetchnta
if the data is only touched once during the entire
execution pass in order to minimize cache pollution in the higher level
caches. This provides instant availability, assuming the prefetch was
issued far ahead enough, when the read access is issued.
Figure 6-7
Examples of Prefetch and Strip-mining for Temporally Adjacent and
Non-Adjacent Passes Loops
Temporally
non-adjacent pa
ss
e
s
Temporally
adjacent pa
ss
e
s
Prefetchnta
Data
s
et A
Reu
s
e
Data
s
et A
Reu
s
e
Data
s
et B
Prefetchnta
Data
s
et B
S
M1
S
M1
Prefetcht0
Data
s
et A
Prefetcht0
Data
s
et B
Reu
s
e
Data
s
et B
Reu
s
e
Data
s
et A
S
M2
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...