2:534
Volume 2, Part 2: MP Coherence and Synchronization
The release store ensures that the code image updates are made visible to the remote
processors in the proper order (i.e.
new_code
is updated before the branch at address
x
is updated). Using the final three instructions ensures that the remote processors will
see the new code the next time they execute the branch at address
x
.
On the local processor, the branch at address
x
also serves to force the pipeline to be
coherent with the code image update the machine without requiring an interrupt,
rfi
instruction, or
srlz.i
instruction. Table 2-16 enumerates the potential pipeline
behaviors to illustrate this point.
In the first and fourth scenarios, the pipeline fetches and executes either the old branch
and old target instruction or the new branch and new target instruction. Note that if the
pipeline sees the new branch, it must also see the new target instruction by virtue of
the way the code in
is written. Either of these behaviors is consistent.
In the second and third scenarios, the pipeline obtains a mix of the old or new branch
and the old or new target instruction. In these cases, the pipeline must flush because
the predicted target will not agree with the branch instruction.
This behavior is not guaranteed unless the branch at address
x
is IP-relative and taken.
The branch must be IP-relative to ensure that both the instruction and target address
can be atomically updated (this is only possible with an IP-relative branch because in
this type of branch, the target address is part of the instruction).
2.5.3
Programmed I/O
Programmed I/O requires that the CPU copy data from the device controller to main
memory using load instructions to read from the device and store instructions to write
data into cacheable memory (page-in).
To ensure correct operation, Itanium architecture-based software must exercise care in
the presence of Programmed I/O due to two features of the architecture. First, the
Itanium architecture does not require an implementation to maintain coherency
between local instruction and data caches for Itanium architecture-based code. Second,
the Itanium architecture allows aggressive instruction prefetching. Specifically, an
implementation can move any location from a cacheable page into its instruction
cache(s) any time a translation for the location indicates that the page is present (i.e.
the
p
bit of the translation is set).
A system that performs Programmed I/O can use a sequence similar to that shown in
to perform the data movement.
presents a code sequence that
updates a code image on both the local and remote processors.
Table 2-16.
Potential Pipeline Behaviors of the Branch at x from Figure 2-9
Pipeline Operation
Scenario #1
Scenario #2
Scenario #3
Scenario #4
Fetch branch at
x
Old branch
Old branch
New branch
New branch
Predict branch at
x
Old target
New target
Old target
New target
Code at target
Old instruction
“New” instruction
(but could be stale)
Old instruction
New instruction
Retire branch at
x
Old retires
Must flush due to
misprediction
Must flush due
to misprediction
New retires
Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Page 1: ......
Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Page 230: ......
Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Page 380: ...2 132 Volume 2 Part 1 Interruptions ...
Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Page 808: ...2 560 Volume 2 Part 2 Context Management ...
Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Page 891: ......
Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Page 1296: ......
Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...