154
IBM Power 720 and 740 Technical Overview and Introduction
4.2.1 Partition availability priority
systems can assign availability priorities to partitions. If the system detects that a
processor core is about to fail, it is taken offline. If the partitions on the system require more
processor units than remain in the system, the firmware determines which partition has the
lowest priority and attempts to claim the needed resource. On a properly configured POWER
processor-based server, this capability allows the system manager to ensure that capacity is
first obtained from a low-priority partition instead of a high-priority partition.
This capability gives the system an additional stage before an unplanned outage. If
insufficient resources exist to maintain full system availability, the server attempts to
maintain partition availability according to user-defined priority.
Partition availability priority is assigned to partitions using a
weight value
or integer rating.
The lowest priority partition is rated at 0 (zero) and the highest priority partition is rated
at 255. The default value is set at 127 for standard partitions and 192 for Virtual I/O Server
(VIOS) partitions. You can vary the priority of individual partitions with the hardware
management console.
4.2.2 General detection and deallocation of failing components
Runtime correctable or recoverable errors are monitored to determine whether there is a
pattern of errors. If these components reach a predefined error limit, the service processor
initiates an action to deconfigure the faulty hardware, helping to avoid a potential system
outage and to enhance system availability.
Persistent deallocation
To enhance system availability, a component that is identified for deallocation or
deconfiguration on a POWER processor-based system is flagged for persistent deallocation.
Component removal can occur either dynamically (while the system is running) or at boot
time (IPL), depending both on the type of fault and when the fault is detected.
In addition, unrecoverable hardware faults can be deconfigured from the system after the first
occurrence. The system can be rebooted immediately after failure and resume operation on
the remaining stable hardware. This prevents the faulty hardware from affecting system
operation again; the repair action is deferred to a more convenient, less critical time.
The following components have the capability to be persistently deallocated:
Processor
L2 and L3 cache lines (Cache lines are dynamically deleted.)
Memory
Deconfigure or bypass failing I/O adapters
Processor instruction retry
As introduced with the POWER6 technology, the processor can retry processor
instructions and do alternate processor recovery for several core-related faults. In this way,
exposure to both permanent and intermittent errors in the processor core is significantly
reduced.
Intermittent errors, often because of cosmic rays or other sources of radiation, are generally
not repeatable.
Содержание Power 720 Express
Страница 2: ......
Страница 14: ...xii IBM Power 720 and 740 Technical Overview and Introduction ...
Страница 128: ...114 IBM Power 720 and 740 Technical Overview and Introduction ...
Страница 204: ...190 IBM Power 720 and 740 Technical Overview and Introduction ...
Страница 205: ......