Chapter 4. Continuous availability and manageability
163
4.2 Availability
The IBM hardware and microcode capability to continuously monitor execution of hardware
functions is generally described as the process of first-failure data capture (FFDC). This
process includes the strategy of predictive failure analysis, which refers to the ability to track
intermittent correctable errors and to vary components offline before they reach the point of
hard failure, causing a system outage, and without the need to re-create the problem.
The POWER7 and family of systems continues to introduce significant
enhancements that are designed to increase system availability and ultimately a high
availability objective with hardware components that are able to perform the following
functions:
Self-diagnose and self-correct during run time.
Automatically reconfigure to mitigate potential problems from suspect hardware.
Self-heal or automatically substitute good components for failing components.
Throughout this chapter, we describe IBM POWER technology’s capabilities that are focused
on keeping a system environment running. For a specific set of functions that are focused on
detecting errors before they become serious enough to stop computing work, see 4.3.1,
“Detecting” on page 175.
4.2.1 Partition availability priority
Also available is the ability to assign availability priorities to partitions. If an alternate
processor recovery event requires spare processor resources and there are no other means
of obtaining the spare resources, the system determines which partition has the lowest
priority and attempts to claim the needed resource. On a properly configured POWER
processor-based server, this approach allows that capacity to first be obtained from a
low-priority partition instead of a high-priority partition.
This capability is relevant to the total system availability because it gives the system an
additional stage before an unplanned outage. In the event that insufficient resources exist to
maintain full system availability, these servers attempt to maintain partition availability by
user-defined priority.
Partition availability priority is assigned to partitions using a
weight value
or integer rating, the
lowest priority partition rated at 0 (zero) and the highest priority partition valued at 255. The
default value is set at 127 for standard partitions and 192 for Virtual I/O Server (VIOS)
partitions. You can vary the priority of individual partitions.
Partition availability priorities can be set for both dedicated and shared processor partitions.
The POWER Hypervisor uses the relative partition weight value among active partitions to
favor higher priority partitions for processor sharing, adding and removing processor capacity,
and favoring higher priority partitions for normal operation.
Note that the partition specifications for
minimum
,
desired
, and
maximum
capacity are also
taken into account for capacity-on-demand options and if total system-wide processor
capacity becomes disabled because of deconfigured failed processor cores. For example, if
total system-wide processor capacity is sufficient to run all partitions, at least with the
Independent: POWER7 and processor-based servers are independent of the
operating system for error detection and fault isolation within the central electronics
complex.
Содержание Power 780
Страница 2: ......
Страница 14: ...xii IBM Power 770 and 780 9117 MMD 9179 MHD Technical Overview and Introduction...
Страница 134: ...120 IBM Power 770 and 780 9117 MMD 9179 MHD Technical Overview and Introduction...
Страница 172: ...158 IBM Power 770 and 780 9117 MMD 9179 MHD Technical Overview and Introduction...
Страница 218: ...204 IBM Power 770 and 780 9117 MMD 9179 MHD Technical Overview and Introduction...
Страница 219: ......