152
IBM Power 595 Technical Overview and Introduction
•
When the system fan speed is out of operational specification, for example because
of a fan failure, the system can increase speed on the redundant fans to
compensate for this failure or take other actions
•
When the server input voltages are out of operational specification.
Mutual Surveillance
– The service processor monitors the operation of the POWER Hypervisor firmware
during the boot process and watches for loss of control during system operation. It also
allows the POWER Hypervisor to monitor service processor activity. The service
processor can take appropriate action, including calling for service, when it detects the
POWER Hypervisor firmware has lost control. Likewise, the POWER Hypervisor can
request a service processor repair action if necessary.
Availability
– The auto-restart (reboot) option, when enabled, can reboot the system automatically
following an unrecoverable firmware error, firmware hang, hardware failure, or
environmentally induced (ac power) failure.
Fault Monitoring
– Built-in self-test (BIST) checks processor, memory, and associated hardware required
for proper booting of the operating system, when the system is powered on at the initial
install or after a hardware configuration change (such as an upgrade). If a non-critical
error is detected or if the error occurs in a resource that can be removed from the
system configuration, the booting process is designed to proceed to completion. The
errors are logged in the system nonvolatile random access memory (NVRAM). When
the operating system completes booting, the information is passed from the NVRAM
into the system error log where it is analyzed by error log analysis (ELA) routines.
Appropriate actions are taken to report the boot time error for subsequent service if
required.
One important service processor improvement allows the system administrator or service
representative dynamic access to the Advanced Systems Management Interface (ASMI)
menus. In previous generations of servers, these menus were only accessible when the
system was in standby power mode. Now, the menus are available from any Web browser
enabled console attached to the Ethernet service network concurrent with normal system
operation. A user with the proper access authority and credentials can now dynamically
modify service defaults, interrogate service processor progress and error logs, set and reset
Guiding Light LEDs, and access all service processor functions without having to power down
the system to the standby state.
The service processor also manages the interfaces for connecting uninterruptible power
source systems to the POWER6 process-based systems, performing Timed Power On (TPO)
sequences, and interfacing with the power and cooling subsystem.
4.3.3 Detecting errors
The first and most crucial component of a solid serviceability strategy is the ability to
accurately and effectively detect errors when they occur. While not all errors are a guaranteed
threat to system availability, those that go undetected can cause problems because the
system does not have the opportunity to evaluate and act if necessary. POWER6
process-based systems employ System z server inspired error detection mechanisms that
extend from processor cores and memory to power supplies and hard drives.
Summary of Contents for Power 595
Page 2: ......
Page 120: ...108 IBM Power 595 Technical Overview and Introduction...
Page 182: ...170 IBM Power 595 Technical Overview and Introduction...
Page 186: ...174 IBM Power 595 Technical Overview and Introduction...
Page 187: ......