162
IBM Power 720 and 740 Technical Overview and Introduction
The service processor provides the following features:
Placing calls
On systems without a Hardware Management Console, the service processor can place
calls to report surveillance failures with the POWER Hypervisor, critical environmental
faults, and critical processing faults even when the main processing unit is inoperable.
Mutual surveillance
The service processor monitors the operation of the firmware during the boot process, and
also monitors the hypervisor for termination. The hypervisor monitors the service
processor and will perform a reset/reload operation if it detects the loss of the service
processor. If the reset/reload operation does not correct the problem with the service
processor, the hypervisor will notify the operating system and the operating system can
take appropriate action, including calling for service.
Availability
The family of systems continues to offer and introduce significant
enhancements designed to increase system availability.
As in POWER6, , and POWER7, the processor has the ability to do
processor instruction retry and alternate processor recovery for several core-related faults.
This significantly reduces exposure to both hard (logic) and soft (transient) errors in the
processor core. Soft failures in the processor core are transient (intermittent) errors, often
because of cosmic rays or other sources of radiation, and generally are not repeatable.
When an error is encountered in the core, the processor will first automatically
retry the instruction. If the source of the error was truly transient, the instruction will
succeed and the system will continue as before. On IBM systems prior to POWER6, this
error would have caused a checkstop.
Hard failures are more difficult, being true logical errors that will be replicated each time
the instruction is repeated. Retrying the instruction will not help in this situation. As in
POWER6, , and POWER7, all processors have the ability to extract
the failing instruction from the faulty core and retry it elsewhere in the system for several
faults, after which the failing core is dynamically deconfigured and called out for
replacement. These systems are designed to avoid a full system outage.
Uncorrectable error recovery
The auto-restart (reboot) option, when enabled, can reboot the system automatically
following an unrecoverable firmware error, firmware hang, hardware failure, or
environmentally induced (AC power) failure.
The auto-restart (reboot) option must be enabled from the Advanced System
Management Interface (ASMI) or from the Control (Operator) Panel.
Содержание Power 720 Express
Страница 2: ......
Страница 14: ...xii IBM Power 720 and 740 Technical Overview and Introduction ...
Страница 128: ...114 IBM Power 720 and 740 Technical Overview and Introduction ...
Страница 204: ...190 IBM Power 720 and 740 Technical Overview and Introduction ...
Страница 205: ......