Troubleshooting the CPU
Each server supports from one to four IPF processor modules. Each processor module contains two
individual CPU cores. This results in up to eight physical CPUs installed in rx6600 servers.
Furthermore, unlike previous IPF processor modules, each physical CPU core contains logic to
support two physical threads. This results in up to eight physical threads, or the equivalent of 16
logical CPUs in rx6600 servers when four processor modules are installed and enabled.
NOTE:
The operating system kernel attaches one or more software processes to each available
thread. In multiple processor servers, having more threads means all software processes are
launched and executed faster.
IPF Processor Load Order
For a minimally loaded server, one IPF processor module must be installed in CPU socket 0 on the
Processor board CRU, and its threads must be enabled by user actions. Additional processor
modules of the same revision are installed in CPU sockets 1-3 in rx6600 servers.
Processor Module Behaviors
All enabled CPUs and their threads almost immediately become functional after system power is
applied. Each thread is in a race to fetch their instructions from their CPU’s instruction and data
caches to complete early self test and rendezvous.
Early code fetches come from PDH, until memory is configured. Normal execution is fetched from
main memory.
Local machine check abort (MCA) events cause the physical CPU core and one or both of its logical
CPUs within that IPF processor module to fail while all other physical and their logical CPUs continue
operating. Double-bit data cache errors in any physical CPU core will cause a Global MCA event
that causes all logical and physical CPUs in the server to fail and reboot the operating system.
Customer Messaging Policy
•
A diagnostic LED only lights for physical CPU core errors, when isolation is to a specific IPF
processor module. If there is any uncertainty about a specific CPU, the customer is pointed to
the SEL for any action, and the suspect IPF processor module’s CRU LED on the diagnostic
panel is not lighted.
•
For configuration style errors, for example, when there is no IPF processor module installed
in CPU socket 0, all of the CRU LEDs on the diagnostic LED panel are lighted for all of the IPF
processor modules that are missing.
•
No diagnostic messages are reported for single-bit errors that are corrected in both instruction
and data caches, during corrected machine check (CMC) events to any physical CPU core.
Diagnostic messages are reported for CMC events when thresholds are exceeded for single-bit
errors; fatal processor errors cause global / local MCA events.
Table 51
lists the processor events that light the diagnostic panel LEDs.
Table 51 Processor Events That Light Diagnostic Panel LEDs
Notes
Source
Cause
Sample IPMI Events
Diagnostic
LEDs
This event will likely
follow other failed
processor(s)
SFW
Processor failed
and deconfigured
Type E0h, 39d:04d BOOT_DECONFIG_CPU
Processors
Threshold exceeded
for cache parity
errors on processor
WIN
Agent
Too many cache
errors detected
by processor
Type E0h, 5823d:26d PFM_CACHE_ERR_PROC
Processors
156
Troubleshooting