2:518
Volume 2, Part 2: MP Coherence and Synchronization
2.2.1.8
Store Buffers May Satisfy Local Loads
In the Itanium memory ordering model, store buffers (or other logically-equivalent
structures) may satisfy local read requests from loads or acquire loads even if the
stored data is not yet visible to other agents in the coherence domain. Such bypassing
must honor any ordering semantics in the memory reference stream.
and
that
presents illustrate this behavior.
.
In this sequence, each processor bypasses its locally-written value from a store buffer
before the value becomes visible to the other processor. This behavior may make
accesses of different sizes that have overlapping memory addresses appear to complete
non-atomically.
The following discussion focuses on the outcome r1 = 1, r3 = 1, r2 = 0, and r4 = 0
because this outcome is allowed if and only if store buffers can satisfy local loads (other
outcomes are allowed but do not depend on being able to satisfy local loads from a
store buffer).
The Itanium memory ordering semantics only require that
and
.
There are no constraints on the relative ordering of M1 and M2 or M3 nor on the relative
ordering of M4 and M5 or M6.
Remember that both dependencies and the memory ordering model place requirements
on the manner in which a processor based on the Itanium architecture may re-order
accesses. Even though the Itanium memory ordering model allows loads to pass stores,
a processor based on the Itanium architecture cannot re-order the following sequence:
st.rel
[x] = r0
// M1: store 0 to [x]
ld.acq
r1 = [x]
// M2: cannot move above st.rel due to RAW
This is because there is a RAW dependency through memory between M1 and M2 and
the Itanium memory ordering model requires that the local processor resolve RAW,
WAR, and WAW dependencies between its memory accesses in program order. Thus,
even though the ordering semantics place no constraints on the relative
ordering of M1 and M2.
Because there is a RAW dependency through memory between M1 and M2 and between
M4 and M5, the ordering constraints
effectively
become:
1
Table 2-10.
Store Buffers May Satisfy Loads if the Stored Data is Not Yet
Globally Visible
Processor #0
Processor #1
st.rel
[x] = 1
// M1
ld.acq
r1 = [x]
// M2
ld
r2 = [y]
// M3
st.rel
[y] = 1
// M4
ld.acq
r3 = [y]
// M5
ld
r4 = [x]
// M6
Outcome:
r1 = 1, r3 = 1, r2 = 0, and r4 = 0 is allowed
1.
That is, the store operations must become visible to the local processors before their loads that read
the stored value.
M2
M3
M5
M6
M1
M2
M1
M2
M3
M4
M5
M6
Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Page 1: ......
Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Page 230: ......
Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Page 380: ...2 132 Volume 2 Part 1 Interruptions ...
Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Page 808: ...2 560 Volume 2 Part 2 Context Management ...
Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Page 891: ......
Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Page 1296: ......
Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...