PRIMEPOWER 650 and 850 Technical White Paper
01/12/31
15
High Availability Functions
RAS/HA Concept
For PRIMEPOWER 650 and 850 models, HA (High Availability) functions, based on mainframe technology , have
been employed. To implement HA functions, the RAS (Reliability, Availability, Serviceability) concept has to be
guaranteed for each function.
High Availability, or the elimination of job stoppage time, cannot be achieved simply by increasing the reliability of the
hardware components. High Availability must also be achieved for the software, applications, and support services. It is
therefore essential to provide “system” RAS functions.
Of course to achieve high reliability, the quality of parts must be increased to the maximum. In addition, appropriate
parts must be selected taking the product lifespan into consideration. However, there are no parts that can be guaranteed
to never break, and it is therefore always necessary to consider the possibility of a failure. This applies to software as well
as to hardware. Naturally, it is highly desirable to have software that is free of bugs. However, since there are software
bugs that are triggered by hardware failures, it is extremely difficult to completely eliminate all bugs. Still, it goes without
saying that all efforts must be made to improve the reliability of the hardware and software.
Fujitsu controls and guarantees the reliability of the parts used. When new parts are used, Fujitsu evaluates them by
checking the lifespan using stress tests such as burn-in tests to determine whether the parts provide the reliability that the
product aims for.
Availability can be expressed as an index indicating the time when the system is available for job operation. Because the
number of errors cannot be kept to zero, mechanisms that ensure high availability must be installed to enable system
operation to continue when a hardware failure occurs in a part or unit, an error occurs in the basic software such as the OS,
or an error or failure occurs in an application process.
PRIMEPOWER 650 and 850 models incorporate the following basic mechanisms to provide for high availability:
- An expanded automatic error checking and correction range
- Improved retry functions when an error is detected and the provision of a degradation function which isolates failed
components and allows a restart using a valid, if reduced, configuration.
- An automatic system restart to reduce down time
- A panel display function for error fault location at system startup
- Reduced system start time
- Redundant configurations for power supplies and fans and the provision of hot swappable components.
Serviceability refers to the functions that are used to quickly and easily recover the system from any problems that may
occur during system operation. To achieve this, the cause of any occurring error must be identified, and the component
or components that caused the error isolated and replaced. In addition, the event and conditions must be reported to the
system administrator and maintenance personnel in a format that is easy to understand.
Machine management software is provided with PRIMEPOWER 650 and 850 to support the isolating of fault locations
and the replacement of components without having to stop the system. This software also enables the system
administrator and maintenance personnel to clearly identify the operating status of all units and to enable the maintenance
personnel to perform the appropriate maintenance work.
Redundant Configuration and Hot Swapping
The power supply and fan units of these modules have a redundant configuration. Storage can also be installed in
redundant configuration by using mechanisms such as dual RAID controllers and disk mirroring. This can be achieved
for these models by combining Fujitsu’s SynfinityDisk and a multipath disk control package. Moreover, SynfinityDisk
can be used to mirror the system volumes themselves. Even if a disk error occurs at booting, the boot disk is switched
automatically and the OS is restarted without the system process stopping.