Configuring and Deconfiguring Processors or Memory
All failures that crash the system with a machine check or check stop, even if intermittent, are reported as
a diagnostic callout for service repair. To prevent the recurrence of intermittent problems and improve the
availability of the system until a scheduled maintenance window, processors and memory books with a
failure history are marked
″
bad
″
to prevent their being configured on subsequent boots.
A processor or memory book is marked
″
bad
″
under the following circumstances:
v
A processor or memory book fails built-in self-test (BIST) or power-on self-test (POST) testing during
boot (as determined by the service processor).
v
A processor or memory book causes a machine check or check stop during runtime, and the failure can
be isolated specifically to that processor or memory book (as determined by the processor runtime
diagnostics in the service processor).
v
A processor or memory book reaches a threshold of recovered failures that results in a predictive callout
(as determined by the processor run-time diagnostics in the service processor).
During boot time, the service processor does not configure processors or memory books that are marked
“bad.”
If a processor or memory book is deconfigured, the processor or memory book remains offline for
subsequent reboots until it is replaced or repeat gard is disabled. The repeat gard function also provides
the user with the option of manually deconfiguring a processor or memory book, or re-enabling a
previously deconfigured processor or memory book. For information on configuring or deconfiguring a
processor, see the Processor Configuration/Deconfiguration Menu on page 725.
For information on configuring or deconfiguring a memory book, see the Memory
Configuration/Deconfiguration Menu on page 726. Both of these menus are submenus under the System
Information Menu.
You can enable or disable CPU Repeat Gard or Memory Repeat Gard using the Processor
Configuration/Deconfiguration Menu.
Run-Time CPU Deconfiguration (CPU Gard)
L1 instruction cache recoverable errors, L1 data cache correctable errors, and L2 cache correctable errors
are monitored by the processor runtime diagnostics (PRD) code running in the service processor. When a
predefined error threshold is met, an error log with warning severity and threshold exceeded status is
returned to AIX. At the same time, PRD marks the CPU for deconfiguration at the next boot. AIX will
attempt to migrate all resources associated with that processor to another processor and then stop the
defective processor.
Service Processor System Monitoring - Surveillance
Surveillance is a function in which the service processor monitors the system, and the system monitors the
service processor. This monitoring is accomplished by periodic samplings called
heartbeats
.
Surveillance is available during two phases:
v
System firmware bringup (automatic)
v
Operating system runtime (optional)
Note:
Operating system surveillance is disabled on partitioned systems.
System Firmware Surveillance
System firmware surveillance is automatically enabled during system power-on. It cannot be disabled by
the user, and the surveillance interval and surveillance delay cannot be changed by the user.
752
Eserver
pSeries 670 Service Guide
Summary of Contents for pSeries 670
Page 1: ...pSeries 670 Service Guide SA38 0615 03 ERserver...
Page 2: ......
Page 3: ...pSeries 670 Service Guide SA38 0615 03 ERserver...
Page 12: ...x Eserver pSeries 670 Service Guide...
Page 16: ...xiv Eserver pSeries 670 Service Guide...
Page 18: ...xvi Eserver pSeries 670 Service Guide...
Page 324: ...Yes Go to Step 154A 14 on page 302 304 Eserver pSeries 670 Service Guide...
Page 718: ...698 Eserver pSeries 670 Service Guide...
Page 848: ...L3 Cache Shorts Test Step 4 828 Eserver pSeries 670 Service Guide...
Page 849: ...L3 Cache Shorts Test Step 5 Chapter 9 Removal and Replacement Procedures 829...
Page 851: ...L3 Cache Shorts Test Step 2 Chapter 9 Removal and Replacement Procedures 831...
Page 853: ...L3 Cache Shorts Test Step 4 Chapter 9 Removal and Replacement Procedures 833...
Page 854: ...L3 Cache Shorts Test Step 5 834 Eserver pSeries 670 Service Guide...
Page 971: ...7040 Model 671 Media Subsystem 1 2 3 4 5 6 7 8 9 10 Chapter 10 Parts Information 951...
Page 973: ...Power and SCSI Cables to the Media Subsystem 1 2 3 5 4 6 Chapter 10 Parts Information 953...
Page 986: ...966 Eserver pSeries 670 Service Guide...
Page 990: ...970 Eserver pSeries 670 Service Guide...
Page 1018: ...998 Eserver pSeries 670 Service Guide...
Page 1020: ...1000 Eserver pSeries 670 Service Guide...
Page 1028: ...1008 Eserver pSeries 670 Service Guide...
Page 1031: ......