
122
IBM BladeCenter PS703 and PS704 Technical Overview and Introduction
Self-diagnose and self-correct during run time
Automatically reconfigure to mitigate potential problems from suspect hardware
Self-heal or substitute good components for failing components automatically
Throughout this chapter, we describe IBM POWER technology’s capabilities that are focused
on keeping a system environment up and running. For a specific set of functions that are
focused on detecting errors before they become serious enough to stop computing work, see
4.4.1, “Detecting” on page 131.
4.3.1 Partition availability priority
Also available is the ability to assign availability priorities to partitions. If an alternate
processor recovery event requires spare processor resources and there are no other means
of obtaining the spare resources, the system determines which partition has the lowest
priority and attempts to claim the needed resource. On a properly configured POWER
processor-based server, this approach allows that capacity to be first obtained from a low
priority partition instead of a high priority partition.
This capability is relevant to total system availability because it gives the system an additional
stage before an unplanned outage. In the event that insufficient resources exist to maintain
full system availability, these servers attempt to maintain partition availability by user-defined
priority.
Partition-availability priority is assigned to partitions by using a weight value or integer rating.
The lowest priority partition is rated at 0 (zero) and the highest priority partition is valued at
255. The default value is set at 127 for standard partitions and 192 for Virtual I/O Server
(VIOS) partitions. You can vary the priority of individual partitions.
Partition-availability priorities can be set for both dedicated and shared processor partitions.
The POWER Hypervisor uses the relative partition weight value among active partitions to
favor higher priority partitions for processor sharing, adding and removing processor capacity,
and favoring higher priority partitions for normal operation.
The partition specifications for minimum, desired, and maximum capacity are taken into
account for capacity-on-demand options, and if total system-wide processor capacity
becomes disabled because of deconfigured failed processor cores. For example, if total
system-wide processor capacity is sufficient to run all partitions with the minimum capacity,
the partitions are allowed to start or continue running. If processor capacity is insufficient to
run a partition at its minimum value, starting that partition results in an error condition that
must be resolved.
4.3.2 General detection and deallocation of failing components
Runtime correctable or recoverable errors are monitored to determine if there is a pattern of
errors. If these components reach a predefined error limit, the service processor initiates an
Note: POWER7 processor-based servers are independent of the operating system for
error detection and fault isolation within the central electronics complex.
Note: On IVM-managed systems the partition availability priority is changed by using the
chsycfg
command with the lpar_avail_priority flag. SDMC-managed systems can change
the virtual server priority from the Power Systems Resources view by right-clicking the
server name and selecting Virtual Server Availability Priority.
Summary of Contents for BladeCenter PS703
Page 2: ......
Page 8: ...vi IBM BladeCenter PS703 and PS704 Technical Overview and Introduction...
Page 14: ...xii IBM BladeCenter PS703 and PS704 Technical Overview and Introduction...
Page 50: ...36 IBM BladeCenter PS703 and PS704 Technical Overview and Introduction...
Page 164: ...150 IBM BladeCenter PS703 and PS704 Technical Overview and Introduction...
Page 197: ......