Troubleshooting the server memory
Memory DIMM load order
For a minimally loaded server, two equal-size DIMMs must be installed in the DIMM slots. For more
information, see
Memory subsystem behaviors
Corrective action, such as DIMM/memory expander replacement, is required when:
•
a threshold is reached for multiple double-byte errors from one or more DRAM chips in the
same rank
•
any uncorrectable memory error (more than 2 bytes) occurs
•
no pair of like DIMMs is loaded in rank 0 of side 0
All other causes of memory DIMM errors are corrected by the processor and reported in CMC
and CPE error logs.
Customer messaging policy
•
The diagnostic LED illuminates only for memory DIMM errors when isolated to a specific DIMM.
If there is uncertainty about a specific DIMM, then the customer is pointed to the SEL for any
actions, and the DIMM CRU LED for the suspect DIMM on the System Insight Display is not
lit.
•
For configuration-type errors, for example, if DIMMs are not installed, the CRU LEDs on the
SID LED panel illuminate for the missing DIMMs.
•
No diagnostic messages are reported for single-byte errors that are corrected in both ICH10
caches and DIMMs during CPE events. Diagnostic messages are reported for CPE events when
thresholds are exceeded for both single-byte and double byte errors; all fatal memory subsystem
errors cause global MCA events.
Table 31 Memory subsystem events that illuminate SID LEDs
Notes
Source
Cause
Sample IPMI Events
Diagnostic
LEDs
N/A
SFW
No DIMMs installed (on
one or more sockets)
Type E0h, 208d:04d
MEM_NO_DIMMS_INSTALLED
DIMMs
Either EEPROM is
misprogrammed or
SFW
A DIMM has a serial
presence detect (SPD)
Type E0h, 172d:04d
MEM_DIMM_SPD_CHECKSUM
DIMMs
this DIMM is
incompatible
EEPROM with a bad
checksum
Memory DIMM is
about to fail or
WIN
Agent
This memory DIMM is
correcting too many
single-bit errors
Type E0h, 4652d:26d
WIN_AGT_PREDICT_MEM_FAIL
DIMMs
environmental
conditions are
causing more
errors than usual
HP Confidential
82
Troubleshooting