Optimizing Cache Usage
6
6-43
The choice of single-pass or multi-pass can have a number of
performance implications. For instance, in a multi-pass pipeline, stages
that are limited by bandwidth (either input or output) will reflect more
of this performance limitation in overall execution time. In contrast, for
a single-pass approach, bandwidth-limitations can be distributed/
amortized across other computation-intensive stages. Also, the choice of
which prefetch hints to use are also impacted by whether a single-pass
or multi-pass approach is used (see “Hardware Prefetching of Data”).
Memory Optimization using Non-Temporal Stores
The non-temporal stores can also be used to manage data retention in
the cache. Uses for the non-temporal stores include:
•
To combine many writes without disturbing the cache hierarchy.
•
To manage which data structures remain in the cache and which are
transient.
Detailed implementations of these usage models are covered in the
following sections.
Non-temporal Stores and Software Write-Combining
Use non-temporal stores in the cases when the data to be stored is:
•
write-once (non-temporal)
•
too large and thus cause cache thrashing
Non-temporal stores do not invoke a cache line allocation, which means
they are not write-allocate. As a result, caches are not polluted and no
dirty writeback is generated to compete with useful data bandwidth.
Without using non-temporal stores, bus bandwidth will suffer when
caches start to be thrashed because of dirty writebacks.
In Streaming SIMD Extensions implementation, when non-temporal
stores are written into writeback or write-combining memory regions,
these stores are weakly-ordered and will be combined internally inside
the processor’s write-combining buffer and be written out to memory as
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...