Hewlett Packard Enterprise HPE MSA 1060 Installation Manual Download Page 49

Page: 49 / 90

Chapter 7

Troubleshooting

Is the fault related to an internal data path or an external data path?

Is the fault related to a hardware component such as a disk drive module, controller module, expansion module, or
PCM?

Determine where the fault is occurring

When a fault occurs, the management interfaces report the condition using alerts and event logs. The Fault ID status
LED on the enclosure left ear (see

"Front panel components" on page

) also illuminates.

Use the SMU to verify any faults found.

Alerts indicate where the fault is occurring. Observing enclosure components in the Maintenance > Hardware and
Maintenance > About > Hardware Information panels can provide additional context for resolving the reported
failure or fault. See also

"Monitor notifications" on page

and

"Review alerts" below

Event logs contain storage system information classified by event category. See also

"Monitor notifications" on

page

and

"Review the event logs" below

The SMU is useful in determining where the fault is occurring, especially if the LEDs cannot be viewed due to the
location of the system. The SMU provides you with a visual representation of the system—including front and rear
enclosure views—and indicates where the fault is occurring. It provides detailed information about FRUs, data, and
faults.

If the SMU is unavailable, observe the enclosure LEDs. The enclosure LEDs are designed to alert users of any system
faults. Check the LEDs on the back of the enclosure to narrow the fault to a FRU, connection, or both. The LEDs also help
you identify the location of a FRU reporting a fault.

Review alerts

Alerts report system faults, and they are used to monitor system health and track the resolution of reported system
health issues.

You can access alerts from the Alerts panel on the system Dashboard. The Active Alerts table provides a scrollable list
of active health alerts in the system. For each alert, the table shows the following:

How long the alert has been active

Severity of the alert

Affected system component

Description of the problem

Whether the alert has been acknowledged, and whether it has been resolved

Additional controls provide greater detail and recommended actions for alert resolution, if applicable. For more
information about managing alerts, see the Storage Management Guide or the online help provided in the SMU.

Review the event logs

The event logs record all system events. Each event has a numeric code that identifies the type of event that occurred,
and has one of the following severities:

Critical. A failure occurred that may cause a controller to shut down or place data at risk. Correct the problem
immediately.

Error. A failure occurred that may affect data integrity or system stability. Correct the problem as soon as possible.

Warning. A problem occurred that may affect system stability, but not data integrity. Evaluate the problem and
correct it if necessary.

Informational. A configuration or state change occurred, or a problem occurred that the system corrected. No
immediate action is required.

Resolved. A condition that caused an event to be logged has been resolved. No action is required.