
4405ch04 Continuous availability and manageability.fm
Draft Document for Review September 2, 2008 5:05 pm
100
IBM Power 570 Technical Overview and Introduction
Figure 4-5 Memory protection capabilities in action
Memory page deallocation
While coincident single cell errors in separate memory chips is a statistic rarity, POWER6
processor-based systems can contain these errors using a memory page deallocation
scheme for partitions running AIX and for memory pages owned by the POWER Hypervisor.
If a memory address experiences an uncorrectable or repeated correctable single cell error,
the Service Processor sends the memory page address to the POWER Hypervisor to be
marked for deallocation.
The operating system performs memory page deallocation without any user intervention and
is transparent to end users and applications.
The POWER Hypervisor maintains a list of pages marked for deallocation during the current
platform IPL. During a partition IPL, the partition receives a list of all the bad pages in its
address space.
In addition, if memory is dynamically added to a partition (through a Dynamic LPAR
operation), the POWER Hypervisor warns the operating system if memory pages are
included which need to be deallocated.
Finally, should an uncorrectable error occur, the system can deallocate the memory group
associated with the error on all subsequent system reboots until the memory is repaired. This
is intended to guard against future uncorrectable errors while waiting for parts replacement.
Memory control hierarchy
A memory controller on a POWER6 processor-based system is designed with four ports.
Each port connects up to three DIMMs using a daisy-chained bus. The memory bus supports
ECC checking on data, addresses, and command information. A spare line on the bus is also
available for repair using a self-healing strategy. In addition, ECC checking on addresses and
commands is extended to DIMMs on DRAMs. Because it uses a daisychained memory
access topology, this system can deconfigure a DIMM that encounters a DRAM fault, without
deconfiguring the bus controller, even if the bus controller is contained on the DIMM.
Note:
Memory page deallocation handles single cell failures, but, because of the sheer
size of data in a data bit line, it may be inadequate for dealing with more catastrophic
failures. Redundant bit steering will continue to be the preferred method for dealing with
these types of problems.