160
IBM Power 750 and 760 Technical Overview and Introduction
The service processor can immediately shut down a system in the following circumstances:
Temperature exceeds the critical level or remains above the warning level for too long.
Internal component temperatures reach critical levels.
Non-redundant fan failures occur.
The service processor provides the following features:
Placing calls
On systems without a Hardware Management Console, the service processor can place
calls to report surveillance failures with the POWER Hypervisor, critical environmental
faults, and critical processing faults even when the main processing unit is inoperable.
Mutual surveillance
The service processor monitors the operation of the firmware during the boot process, and
also monitors the hypervisor for termination. The hypervisor monitors the service
processor and will perform a reset/reload if it detects the loss of the service processor. If
the reset/reload does not correct the problem with the service processor, the hypervisor
will notify the operating system and the operating system can take appropriate action,
including calling for service.
Availability
The family of systems continues to offer and introduce significant
enhancements designed to increase system availability.
As in POWER6, , and POWER7, the processor has the ability to do
processor instruction retry and alternate processor recovery for several core-related faults.
This significantly reduces exposure to both hard (logic) and soft (transient) errors in the
processor core. Soft failures in the processor core are transient (intermittent) errors, often
because of cosmic rays or other sources of radiation, and generally are not repeatable.
When an error is encountered in the core, the processor will first automatically
retry the instruction. If the source of the error was truly transient, the instruction will
succeed and the system will continue as before. On IBM systems prior to POWER6, this
error might have caused a checkstop.
Hard failures are more difficult, being true logical errors that are replicated each time the
instruction is repeated. Retrying the instruction will not help in this situation. As in
POWER6, , and POWER7, all processors have the ability to extract
the failing instruction from the faulty core and retry it elsewhere in the system for several
faults, after which the failing core is dynamically deconfigured and called out for
replacement. These systems are designed to avoid a full system outage.
Uncorrectable error recovery
The auto-restart (reboot) option, when enabled, can reboot the system automatically
following an unrecoverable firmware error, firmware hang, hardware failure, or
environmentally induced (AC power) failure.
The auto-restart (reboot) option must be enabled from the Advanced System
Management Interface (ASMI) or from the Control (Operator) Panel.
Содержание Power 750 Express
Страница 2: ......
Страница 56: ...42 IBM Power 750 and 760 Technical Overview and Introduction ...
Страница 162: ...148 IBM Power 750 and 760 Technical Overview and Introduction ...
Страница 202: ...188 IBM Power 750 and 760 Technical Overview and Introduction ...
Страница 203: ......