Chapter 4. Continuous availability and manageability
153
4.1.2 Placement of components
Packaging is designed to deliver both high performance and high reliability. For example,
the reliability of electronic components is directly related to their thermal environment. That is,
large decreases in component reliability are directly correlated with relatively small increases
in temperature. All POWER processor-based systems are carefully packaged to ensure
adequate cooling. Critical system components such as the processor chips are
positioned on the planar so that they receive clear air flow during operation. In addition,
POWER processor-based systems are built with redundant, variable-speed fans that can
automatically increase output to compensate for increased heat in the central electronic
complex.
4.1.3 Redundant components and concurrent repair
High-opportunity components, those that most affect system availability, are protected with
redundancy and the ability to be repaired concurrently. The use of these redundant
components allows the system to remain operational:
cores, which include redundant bits in L1 instruction and data caches, L2
caches, and L2 and L3 directories
Power 720 and Power 740 main memory DIMMs, which use an innovative ECC algorithm
from IBM research that improves bit error correction and memory failures
Redundant and hot-swap cooling
Redundant and hot-swap power supplies
For maximum availability, be sure to connect power cords from the same system to two
separate power distribution units (PDUs) in the rack, and to connect each PDU to
independent power sources. Tower form factor power cords must be plugged into two
independent power sources to achieve maximum availability.
4.2 Availability
First-failure data capture (FFDC) is the capability of IBM hardware and microcode to
continuously monitor hardware functions. This process includes predictive failure analysis,
which is the ability to track intermittent correctable errors and to take components offline
before they reach the point of hard failure. This way avoids causing a system outage. The
family of systems can perform the following automatic functions:
Self-diagnose and self-correct errors during run time.
Automatically reconfigure to mitigate potential problems from suspect hardware.
Self-heal or automatically substitute good components for failing components.
This chapter describes IBM processor-based systems technologies. focused on
keeping a system running. For a specific set of functions focused on detecting errors before
they become serious enough to stop computing work, see 4.3.1, “Detecting” on page 161.
Before ordering: Check your configuration for optional redundant components before
ordering your system.
Remember: Error detection and fault isolation is independent of the operating system in
processor-based servers.
Содержание Power 720 Express
Страница 2: ......
Страница 14: ...xii IBM Power 720 and 740 Technical Overview and Introduction ...
Страница 128: ...114 IBM Power 720 and 740 Technical Overview and Introduction ...
Страница 204: ...190 IBM Power 720 and 740 Technical Overview and Introduction ...
Страница 205: ......