Chapter 4. Reliability, availability, and serviceability
165
Draft Document for Review October 14, 2014 10:19 am
5137ch04.fm
(Light Path for low-end servers), and access all service processor functions without having
to power down the system to the standby state. The administrator or IBM SSR dynamically
can access the menus from any web browser-enabled console that is attached to the
Ethernet service network, concurrently with normal system operation. Some options, such
as changing the hypervisor type, do not take effect until the next boot.
Management of the interfaces for connecting uninterruptible power source systems to the
POWER processor-based systems and performing timed power-on (TPO) sequences.
4.6.4 Diagnosing
General diagnostic objectives are to detect and identify problems so that they can be resolved
quickly. The IBM diagnostic strategy includes the following elements:
Provide a common error code format that is equivalent to a system reference code,
system reference number, checkpoint, or firmware error code.
Provide fault detection and problem isolation procedures. Support a remote connection
ability that is used by the IBM Remote Support Center or IBM Designated Service.
Provide interactive intelligence within the diagnostic tests with detailed online failure
information while connected to IBM back-end system.
Using the extensive network of advanced and complementary error detection logic that is built
directly into hardware, firmware, and operating systems, the IBM Power Systems servers can
perform considerable self-diagnosis.
Because of the FFDC technology that is designed in to IBM servers, re-creating diagnostic
tests for failures or requiring user intervention is not necessary. Solid and intermittent errors
are designed to be correctly detected and isolated at the time that the failure occurs. Runtime
and boot time diagnostic tests fall into this category.
Boot time
When an IBM Power Systems server powers up, the service processor initializes the system
hardware. Boot-time diagnostic testing uses a multi-tier approach for system validation,
starting with managed low-level diagnostic tests that are supplemented with system firmware
initialization and configuration of I/O hardware, followed by OS-initiated software test routines.
To minimize boot time, the system determines which of the diagnostic tests are required to be
started to ensure correct operation, which is based on the way that the system was powered
off, or on the boot-time selection menu.
Host Boot IPL
In POWER8, the initialization process during IPL changed. The Flexible Service Processor
(FSP) is no longer the only instance that initializes and runs the boot process. With POWER8,
the FSP initializes the boot processes, but on the POWER8 processor itself, one part of the
firmware is running and performing the Central Electronics Complex chip initialization. A new
component that is called the PNOR chip stores the Host Boot firmware and the Self Boot
Engine (SBE) is an internal part of the POWER8 chip itself and is used to boot the chip.
With this Host Boot initialization, new progress codes are available. An example of an FSP
progress code is C1009003. During the Host Boot IPL, progress codes, such as CC009344,
appear.
If there is a failure during the Host Boot process, a new Host Boot System Dump is collected
and stored. This type of memory dump includes Host Boot memory and is off-loaded to the
HMC when it is available.