5137ch04.fm
Draft Document for Review October 14, 2014 10:19 am
156
IBM Power Systems E870 and E880 Technical Overview and Introduction
4.4 Enterprise systems availability details
Besides all the standard RAS features described, Enterprise class systems allows for
increased RAS and availability by including several features and redundant components.
Below is a list with the main features exclusive to the Enterprise class systems:
Redundant Service Processor
The service processor is an essential component of a system, responsible for is the initial
power load (IPL), setup, monitoring, control and management. The control units, present
on enterprise class systems house two redundant service processors. In case of a failure
in either of the service processors, the second one allows for continued operation of the
system until a replacement is scheduled. Even a system with a single system node had
dual service processors in the system control unit.
Redundant System Clock Cards
Another component crucial to the system operations is the system clock cards. they are
responsible to providing synchronized clock signals for the whole system. The control
units, present on enterprise class systems house two redundant system clock cards. In
case of a failure in any of the clock cards, the second one allows for continued operation of
the system until a replacement is scheduled. Even a system with a single system node
had dual clock cards on the system control unit.
Dynamic Processor Sparing
Enterprise class systems are Capacity Upgrade on Demand capable. Processor sparing
helps minimize the impact to server performance caused by a failed processor. An inactive
processor is activated if a failing processor reaches a predetermined error threshold, thus
helping to maintain performance and improve system availability. Dynamic processor
sparing happens dynamically and automatically when using dynamic logical partitioning
(DLPAR) and the failing processor is detected prior to failure. Dynamic processor sparing
does not require the purchase of an activation code; it requires only that the system have
inactive CUoD processor cores available.
Dynamic Memory Sparing
Enterprise class systems are Capacity Upgrade on Demand capable. Dynamic memory
sparing helps minimize the impact to server performance caused by a failed memory
feature. Memory sparing occurs when on-demand inactive memory is automatically
activated by the system to temporarily replace failed memory until a service action can be
performed.
Active Memory Mirroring for Hypervisor
The hypervisor is the core part of the virtualization layer. Although minimal, its operational
data must reside in memory CDIMMs. In case of a failure of CDIMM the hypervisor could
become inoperative. The Active memory mirroring for hypervisor allows for the memory
blocks used by the hypervisor to be written in two distinct CDIMMs. If an uncorrectable
error is encountered during a read the data is retrieved from the mirrored pair and
operations continue normally.
4.5 Availability impacts of a solution architecture
Any given solution should not rely only on the hardware platform. Despite IBM Power
Systems being far superior RAS than other comparable systems, it is advisable to design a
redundant architecture surrounding the application in order to allow for easier maintenance
tasks and grater flexibility.