© Copyright IBM Corp. 2013. All rights reserved.
149
Chapter 4.
Continuous availability and
manageability
This chapter provides information about IBM reliability, availability, and serviceability (RAS)
design and features. This set of technologies, implemented on IBM Power Systems servers,
improves your architecture’s total cost of ownership (TCO) by reducing planned and
unplanned down time.
The elements of RAS can be described as follows:
Reliability: Indicates how infrequently a defect or fault in a server occurs
Availability: Indicates how infrequently the functionality of a system or application is
impacted by a fault or defect
Serviceability: Indicates how well faults and their effects are communicated to system
managers and how efficiently and non disruptively the faults are repaired
Each successive generation of IBM servers is designed to be more reliable than the previous
server family. processor-based servers have new features to support new levels of
virtualization, help ease administrative burden, and increase system utilization.
Reliability starts with components, devices, and subsystems designed to be fault-tolerant.
uses lower voltage technology, improving reliability with stacked latches to reduce
soft error susceptibility. During the design and development process, subsystems go through
rigorous verification and integration testing processes. During system manufacturing,
systems go through a thorough testing process to help ensure high product quality levels.
The processor and memory subsystem contain several features designed to avoid or correct
environmentally induced, single-bit, intermittent failures and also handle solid faults in
components, including selective redundancy to tolerate certain faults without requiring an
outage or parts replacement.
4