![Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3 Manual Download Page 1775](http://html.mh-extra.com/html/intel/itanium-architecture-software-developers-volume-3-rev-2-3/itanium-architecture-software-developers-volume-3-rev-2-3_manual_20734041775.webp)
Volume 4: IA-32 SSE Instruction Reference
4:473
The MOVNTPS (Non-temporal store of packed single-precision floating-point)
instruction stores data from a SSE register to memory. The memory address must be
aligned to a 16-byte boundary; if it is not aligned, a general protection exception will
occur. The instruction is implicitly weakly-ordered, does not write-allocate and
minimizes cache pollution.
The main difference between a non-temporal store and a regular cacheable store is in
the write-allocation policy. The memory type of the region being written to can override
the non-temporal hint, leading to the following considerations:
• If the programmer specifies a non-temporal store to uncacheable memory, then the
store behaves like an uncacheable store; the non-temporal hint is ignored and the
memory type for the region is retained. Uncacheable as referred to here means that
the region being written to has been mapped with either a UC or WP memory type.
If the memory region has been mapped as WB, WT or WC, the non-temporal store
will implement weakly-ordered (WC) semantic behavior.
• If the programmer specifies a non-temporal store to cacheable memory, two cases
may result:
• If the data is present in the cache hierarchy, the instruction will ensure
consistency. A given processor may choose different ways to implement this;
some examples include: updating data in-place in the cache hierarchy while
preserving the memory type semantics assigned to that region, or evicting the
data from the caches and writing the new non-temporal data to memory (with
WC semantics).
• If the data is not present in the cache hierarchy, and the destination region is
mapped as WB, WT or WC, the transaction will be weakly ordered, and is
subject to all WC memory semantics. The non-temporal store will not write
allocate. Different implementations may choose to collapse and combine these
stores.
• In general, WC semantics require software to ensure coherence, with respect to
other processors and other system agents (such as graphics cards). Appropriate
use of synchronization and a fencing operation (see SFENCE, below) must be
performed for producer-consumer usage models. Fencing ensures that all system
agents have global visibility of the stored data; for instance, failure to fence may
result in a written cache line staying within a processor, and the line would not be
visible to other agents. For processors which implement non-temporal stores by
updating data in-place that already resides in the cache hierarchy, the destination
region should also be mapped as WC. Otherwise if mapped as WB or WT, there is
the potential for speculative processor reads to bring the data into the caches; in
this case, non-temporal stores would then update in place, and data would not be
flushed from the processor by a subsequent fencing operation.
• The memory type visible on the bus in the presence of memory type aliasing is
implementation specific. As one possible example, the memory type written to the
bus may reflect the memory type for the first store to this line, as seen in program
order; other alternatives are possible. This behavior should be considered reserved,
and dependency on the behavior of any particular implementation risks future
incompatibility.
The PREFETCH (Load 32 or greater number of bytes) instructions load either
non-temporal data or temporal data in the specified cache level. This access and the
cache level are specified as a hint. The prefetch instructions do not affect functional
behavior of the program and will be implementation specific.
Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Page 1: ......
Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Page 230: ......
Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Page 380: ...2 132 Volume 2 Part 1 Interruptions ...
Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Page 808: ...2 560 Volume 2 Part 2 Context Management ...
Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Page 891: ......
Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Page 1296: ......
Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...