Volume 4: IA-32 SSE Instruction Reference
4:473
The MOVNTPS (Non-temporal store of packed single-precision floating-point)
instruction stores data from a SSE register to memory. The memory address must be
aligned to a 16-byte boundary; if it is not aligned, a general protection exception will
occur. The instruction is implicitly weakly-ordered, does not write-allocate and
minimizes cache pollution.
The main difference between a non-temporal store and a regular cacheable store is in
the write-allocation policy. The memory type of the region being written to can override
the non-temporal hint, leading to the following considerations:
• If the programmer specifies a non-temporal store to uncacheable memory, then the
store behaves like an uncacheable store; the non-temporal hint is ignored and the
memory type for the region is retained. Uncacheable as referred to here means that
the region being written to has been mapped with either a UC or WP memory type.
If the memory region has been mapped as WB, WT or WC, the non-temporal store
will implement weakly-ordered (WC) semantic behavior.
• If the programmer specifies a non-temporal store to cacheable memory, two cases
may result:
• If the data is present in the cache hierarchy, the instruction will ensure
consistency. A given processor may choose different ways to implement this;
some examples include: updating data in-place in the cache hierarchy while
preserving the memory type semantics assigned to that region, or evicting the
data from the caches and writing the new non-temporal data to memory (with
WC semantics).
• If the data is not present in the cache hierarchy, and the destination region is
mapped as WB, WT or WC, the transaction will be weakly ordered, and is
subject to all WC memory semantics. The non-temporal store will not write
allocate. Different implementations may choose to collapse and combine these
stores.
• In general, WC semantics require software to ensure coherence, with respect to
other processors and other system agents (such as graphics cards). Appropriate
use of synchronization and a fencing operation (see SFENCE, below) must be
performed for producer-consumer usage models. Fencing ensures that all system
agents have global visibility of the stored data; for instance, failure to fence may
result in a written cache line staying within a processor, and the line would not be
visible to other agents. For processors which implement non-temporal stores by
updating data in-place that already resides in the cache hierarchy, the destination
region should also be mapped as WC. Otherwise if mapped as WB or WT, there is
the potential for speculative processor reads to bring the data into the caches; in
this case, non-temporal stores would then update in place, and data would not be
flushed from the processor by a subsequent fencing operation.
• The memory type visible on the bus in the presence of memory type aliasing is
implementation specific. As one possible example, the memory type written to the
bus may reflect the memory type for the first store to this line, as seen in program
order; other alternatives are possible. This behavior should be considered reserved,
and dependency on the behavior of any particular implementation risks future
incompatibility.
The PREFETCH (Load 32 or greater number of bytes) instructions load either
non-temporal data or temporal data in the specified cache level. This access and the
cache level are specified as a hint. The prefetch instructions do not affect functional
behavior of the program and will be implementation specific.
Содержание ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Страница 1: ......
Страница 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 12: ...1 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I Application Architecture Guide ...
Страница 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Страница 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Страница 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Страница 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Страница 230: ......
Страница 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 249: ...2 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I System Architecture Guide ...
Страница 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Страница 380: ...2 132 Volume 2 Part 1 Interruptions ...
Страница 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Страница 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Страница 749: ...2 501 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part II System Programmer s Guide ...
Страница 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Страница 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Страница 808: ...2 560 Volume 2 Part 2 Context Management ...
Страница 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Страница 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Страница 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Страница 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Страница 891: ......
Страница 941: ...3 42 Volume 3 Instruction Reference cmp illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Operation fault ...
Страница 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Страница 1191: ...3 292 Volume 3 Pseudo Code Functions Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Страница 1296: ......
Страница 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Страница 1564: ...4 262 Volume 4 Base IA 32 Instruction Reference LES Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1565: ...Volume 4 Base IA 32 Instruction Reference 4 263 LFS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1568: ...4 266 Volume 4 Base IA 32 Instruction Reference LGS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1583: ...Volume 4 Base IA 32 Instruction Reference 4 281 LSS Load Full Pointer See entry for LDS LES LFS LGS LSS ...
Страница 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Страница 1663: ...Volume 4 Base IA 32 Instruction Reference 4 361 SHL SHR Shift Instructions See entry for SAL SAR SHL SHR ...
Страница 1668: ...4 366 Volume 4 Base IA 32 Instruction Reference SIDT Store Interrupt Descriptor Table Register See entry for SGDT SIDT ...
Страница 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Страница 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Страница 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Страница 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...