It also uses the hardware error detection logic in the processor to capture run-time
recoverable and irrecoverable error indications. The firmware uses the error signatures
in the hardware to analyze and isolate the error to a specific processor.
The processors that are deconfigured remain off-line for subsequent reboots until the
faulty processor hardware is replaced.
This function allows usersto manually deconfigure or re-enable a previously
deconfigured processor through the Service Processor menu. The user can also enable
or disable this function through the Service Processor.
Processor Run-Time Deconfiguration (CPU-Gard)
Processor run-time deconfiguration allows for the dynamic removal of CPUs from the
system configuration. The objective is to minimize system failures or data integrity
exposures due to a faulty processor. The processor to be removed is the one that has
experienced repeated run-time recoverable internal errors (over a predefined threshold).
The function uses the hardware error detection logic in the processor to capture
run-time recoverable error indications. The firmware uses the error signatures in the
hardware to analyze and isolate the error to a specific CPU. The firmware also
maintains error-threshold information.
When an internal recoverable error for a processor reaches a predefined threshold, the
firmware notifies the AIX operating system. The AIX operating system migrates all
software processes and interrupts to another processor and puts the faulty processor in
stop state.
CPUs that are deconfigured at run time remain off-line for subsequent reboots through
the CPU Boot Time Deconfiguration function, until the faulty CPU hardware is replaced.
The user can also enable or disable this function via the AIX system management
function.
Memory Boot-Time Deconfiguration (Memory Repeat-Gard)
Memory boot time deconfiguration allows for the removal of a memory segment or
DIMM from the system configuration at boot time. The objective is to minimize system
failures or data integrity exposure due to faulty memory hardware. The hardware
resource(s) to be removed are the ones that experienced the following failures:
v
A boot-time test failure.
v
Run-time recoverable errors over threshold prior to the current boot phase.
v
Run-time irrecoverable errors prior to the current boot phase.
This function uses firmware Power-On Self-Test (POST) to discover and isolate memory
hardware failures during boot time. It also uses the hardware error detection logic in the
memory controller to capture run-time recoverable and irrecoverable error indications.
The firmware uses the error signatures in the hardware to analyze and isolate the error
to the specific memory segment or DIMM.
64
44P Series Model 170 User’s Guide
Summary of Contents for RS/6000 44P Series 270
Page 2: ......
Page 3: ...RS 6000 44P Series Model 170 User s Guide User s Guide SA38 0559 01 IBM...
Page 12: ...x 44P Series Model 170 User s Guide...
Page 16: ...xiv 44P Series Model 170 User s Guide...
Page 90: ...72 44P Series Model 170 User s Guide...
Page 124: ...106 44P Series Model 170 User s Guide...
Page 162: ...144 44P Series Model 170 User s Guide...
Page 166: ...148 44P Series Model 170 User s Guide...
Page 180: ...162 44P Series Model 170 User s Guide...
Page 182: ...164 44P Series Model 170 User s Guide...
Page 184: ...166 44P Series Model 170 User s Guide...
Page 208: ...190 44P Series Model 170 User s Guide...
Page 215: ......