Chapter 4. Reliability, availability, and serviceability
117
4.5.3 Service processor
In POWER8 processor-based systems, the dedicated service processor is primarily
responsible for fault analysis of processor and memory errors. The service processor is a
microprocessor that is powered separately from the main instruction processing complex. In
the Power E850C server, redundant connections to the service processor provide added
reliability.
In addition to FFDC functions, the service processor performs many serviceability functions:
Remote power control options
Reset and boot features
Environmental monitoring
The service processor interfaces with the OCC function, which monitors the server’s
built-in temperature sensors and sends instructions to the system fans to increase
rotational speed when the ambient temperature is above the normal operating range. By
using an integrated operating system interface, the service processor notifies the
operating system of potential environmental related problems so that the system
administrator can take appropriate corrective actions before a critical failure threshold is
reached. The service processor can also post a warning and start an orderly system
shutdown in the following circumstances:
– The operating temperature exceeds the critical level (for example, failure of air
conditioning or air circulation around the system).
– Internal component temperatures reach or exceed critical levels.
– The system fan speed is out of operational specification (for example, because of
multiple fan failures).
– The server input voltages are out of operational specification.
POWER Hypervisor (system firmware) and HMC connection surveillance
The service processor monitors the operation of the firmware during the boot process, and
also monitors the hypervisor for termination. The hypervisor monitors the service
processor and can perform a reset and reload if it detects the loss of the service
processor. If the reset or reload operation does not correct the problem with the service
processor, the hypervisor notifies the operating system. The operating system can then
take appropriate action, including calling for service. The service processor also monitors
the connection to an HMC and can report loss of connectivity to the operating system
partitions for system administrator notification.
Uncorrectable error recovery
The auto-restart (reboot) option, when enabled, can reboot the system automatically
following an unrecoverable firmware error, firmware hang, hardware failure, or
environmentally induced (ac power) failure.
The auto-restart (reboot) option must be enabled from the ASMI menu or from the
operator panel on the front of the server.
Содержание E850C
Страница 2: ......
Страница 36: ...22 IBM Power System E850C Technical Overview and Introduction...
Страница 114: ...100 IBM Power System E850C Technical Overview and Introduction...
Страница 154: ...140 IBM Power System E850C Technical Overview and Introduction...
Страница 158: ...144 IBM Power System E850C Technical Overview and Introduction...
Страница 159: ......
Страница 160: ...ibm com redbooks Printed in U S A Back cover ISBN 0738455687 REDP 5412 00...