82
IBM System p5 520 and 520Q Technical Overview and Introduction
If the output shows CPU Guard as disabled, enter the following command to enable it:
chdev -l sys0 -a cpuguard='enable'
Cache or cache-line deallocation is aimed at performing dynamic reconfiguration to bypass
potentially failing components. This capability is provided for both L2 and L3 caches. Dynamic
run-time deconfiguration is provided if a threshold of L1 or L2 recovered errors is exceeded.
In the case of an L3 cache run-time array single-bit solid error, the spare resources are used
to perform a line delete on the failing line.
PCI hot-plug slot fault tracking helps prevent slot errors from causing a system machine
check interrupt and subsequent reboot. This provides superior fault isolation, and the error
affects only the single adapter. Run-time errors on the PCI bus caused by failing adapters
result in recovery action. If this is unsuccessful, the PCI device is shut down gracefully. Parity
errors on the PCI bus itself result in bus retry, and if uncorrected, the bus and any I/O
adapters or devices on that bus are deconfigured.
The p5-520 or p5-520Q supports PCI Extended Error Handling (EEH), if it is supported by the
PCI-X adapter. In the past, PCI bus parity errors caused a global machine check interrupt,
which eventually required a system reboot in order to continue. In the p5-520 or p5-520Q
system, hardware, system firmware, and AIX 5L interaction have been designed to allow
transparent recovery of intermittent PCI bus parity errors and graceful transition to the I/O
device available state in the case of a permanent parity error in the PCI bus.
EEH-enabled adapters respond to a special data packet generated from the affected PCI slot
hardware by calling system firmware, which examines the affected bus, allows the device
driver to reset it, and continues without a system reboot.
Persistent deallocation functions include:
Processor
Memory
Deconfigure or bypass failing I/O adapters
L3 cache
Following a hardware error that has been flagged by the service processor, the subsequent
reboot of the system invokes extended diagnostics. If a processor or L3 cache is marked for
deconfiguration by persistent processor deallocation, the boot process attempts to proceed to
completion with the faulty device deconfigured automatically. Failing I/O adapters are
deconfigured or bypassed during the boot process.
3.1.8 Serviceability
Increasing service productivity means the system is up and running for a longer time. The
p5-520 and p5-520Q improve service productivity by providing the functions described in the
following sections.
Error indication and LED indicators
The p5-520 and p5-520Q are designed for client setup of the machine and for the subsequent
addition of most hardware features. The p5-520 and p5-520Q also allow clients to replace
service parts (Client Replaceable Unit). To accomplish this, the p5-520 or p5-520Q provides
Note: The auto-restart (reboot) option, when enabled, can reboot the system automatically
following an unrecoverable software error, software hang, hardware failure, or
environmentally induced failure (such as a loss of the power supply).
Summary of Contents for REDPAPER 520Q
Page 2: ......
Page 8: ...vi IBM System p5 520 and 520Q Technical Overview and Introduction...
Page 14: ...xii IBM System p5 520 and 520Q Technical Overview and Introduction...
Page 38: ...24 IBM System p5 520 and 520Q Technical Overview and Introduction...
Page 104: ...90 IBM System p5 520 and 520Q Technical Overview and Introduction...
Page 108: ...94 IBM System p5 520 and 520Q Technical Overview and Introduction...
Page 109: ......