
Managing Faults
21
■
“Detecting Faults Using POST” on page 42
■
SPARC Enterprise T5440 Server Installation and Setup Guide
■
SPARC Enterprise T5440 Server Administration Guide
Memory Fault Handling Overview
A variety of features play a role in how the memory subsystem is configured and
how memory faults are handled. Understanding the underlying features helps you
identify and repair memory problems. This section describes how the server deals
with memory faults.
Note –
For memory configuration information, see
The server uses advanced ECC technology that corrects up to 4-bits in error on nibble
boundaries, as long as the bits are all in the same DRAM. On 4 GB FB-DIMMs, if a
DRAM fails, the DIMM continues to function.
The following server features independently manage memory faults:
■
POST
– Based on ILOM configuration variables, POST runs when the server is
powered on.
For correctable memory errors (CEs), POST forwards the error to the Solaris
Predictive Self-Healing (PSH) daemon for error handling. If an uncorrectable
memory fault is detected, POST displays the fault with the device name of the
faulty FB-DIMMs, and logs the fault. POST then disables the faulty FB-DIMMs.
Depending on the memory configuration and the location of the faulty FB-DIMM,
POST disables half of physical memory in the system, or half the physical memory
and half the processor threads. When this offlining process occurs in normal
operation, you must replace the faulty FB-DIMMs based on the fault message and
enable the disabled FB-DIMMs with the ILOM command
set
device
component_state=enabled
where
device
is the name of the FB-DIMM being
enabled (for example,
set /SYS/MB/CPU0/CMP0/BR0/CH0/D0
component_state=enabled
).
■
Solaris Predictive Self-Healing (PSH) technology – A feature of the Solaris OS,
PSH uses the Fault Manager daemon (
fmd
) to watch for various kinds of faults.
When a fault occurs, the fault is assigned a unique fault ID (UUID), and logged.
PSH reports the fault and identifies the locations of the faulty FB-DIMMs.
If you suspect that the server has a memory problem, follow the flowchart (see
FIGURE: Diagnostic Flowchart on page 11
). Run the ILOM
show faulty
command.
The
show faulty
command lists memory faults and lists the specific FB-DIMMs
that are associated with the fault.
Summary of Contents for SPARC Enterprise T5440 Server
Page 1: ......
Page 2: ......
Page 6: ......
Page 26: ...xxiv SPARC Enterprise T5440 Server Service Manual July 2009 ...
Page 84: ...58 SPARC Enterprise T5440 Server Service Manual July 2009 ...
Page 180: ...154 SPARC Enterprise T5440 Server Service Manual July 2009 ...
Page 192: ...166 SPARC Enterprise T5440 Server Service Manual July 2009 ...
Page 198: ...172 SPARC Enterprise T5440 Server Service Manual July 2009 ...
Page 212: ...186 SPARC Enterprise T5440 Server Service Manual July 2009 ...
Page 213: ......
Page 214: ......