5137ch02.fm
Draft Document for Review October 14, 2014 10:19 am
60
IBM Power Systems E870 and E880 Technical Overview and Introduction
With AMM, uncorrectable errors in data that are owned by a partition or application are
handled by the existing Special Uncorrectable Error handling methods in the hardware,
firmware, and operating system.
2.3.6 Memory Error Correction and Recovery
The memory has error detection and correction circuitry is designed such that the failure of
any one specific memory module within an ECC word can be corrected without any other
fault.
In addition, a spare DRAM per rank on each memory port provides for dynamic DRAM device
replacement during runtime operation. Also, dynamic lane sparing on the DMI link allows for
repair of a faulty data lane.
Other memory protection features include retry capabilities for certain faults detected at both
the memory controller and the memory buffer.
Memory is also periodically scrubbed to allow for soft errors to be corrected and for solid
single-cell errors reported to the hypervisor, which supports operating system deallocation of
a page associated with a hard single-cell fault.
For more details on Memory RAS, see 4.3.10, “Memory protection” on page 153.
2.3.7 Special Uncorrectable Error handling
Special Uncorrectable Error (SUE) handling prevents an uncorrectable error in memory or
cache from immediately causing the system to terminate. Rather, the system tags the data
and determines whether it will ever be used again. If the error is irrelevant, it does not force a
checkstop. If the data is used, termination can be limited to the program/kernel or hypervisor
owning the data, or freeze of the I/O adapters controlled by an I/O hub controller if data is to
be transferred to an I/O device.
2.4 Capacity on Demand
Several types of Capacity on Demand (CoD) offerings are optionally available on the
Power 870 and Power E880 servers to help meet changing resource requirements in an
on-demand environment, by using resources that are installed on the system but that are not
activated.
2.4.1 Capacity Upgrade on Demand (CUoD)
Power E870 and Power E880 systems include a number of active processor cores and
memory units. They can also include inactive processor cores and memory units. Active
processor cores or memory units are processor cores or memory units that are already
available for use on your server when it comes from the manufacturer. Inactive processor
cores or memory units are processor cores or memory units that are included with your
server, but not available for use until you activate them. Inactive processor cores and memory
Partition data: Active Memory Mirroring will
not
mirror partition data. It was designed to
mirror only the hypervisor code and its components, allowing this data to be protected
against a DIMM failure