Begin by reviewing the reported fault:
•
Is the fault related to an internal data path or an external data path?
•
Is the fault related to a hardware component such as a disk drive module, controller module, or power supply unit?
By isolating the fault to one of the components within the storage system, you will be able to determine the necessary corrective action
more quickly.
Determine where the fault is occurring
When a fault occurs, the Module Fault LED—located on the Ops panel on an enclosure’s left ear—illuminates. Check the LEDs on the back
of the enclosure to narrow the fault to a CRU, connection, or both. The LEDs also help you identify the location of a CRU reporting a fault.
See also
Use the ME Storage Manager to verify any faults found while viewing the LEDs. The ME Storage Manager is also a good tool to use in
determining where the fault is occurring if the LEDs cannot be viewed due to the location of the system. This web application provides you
with a visual representation of the system and where the fault is occurring. The ME Storage Manager also provides more detailed
information about CRUs, data, and faults.
Review the event logs
The event logs record all system events. Each event has a numeric code that identifies the type of event that occurred, and has one of the
following severities:
•
Critical. A failure occurred that may cause a controller to shut down. Correct the problem immediately.
•
Error. A failure occurred that may affect data integrity or system stability. Correct the problem as soon as possible.
•
Warning. A problem occurred that may affect system stability, but not data integrity. Evaluate the problem and correct it if necessary.
•
Informational. A configuration or state change occurred, or a problem occurred that the system corrected. No immediate action is
required.
The event logs record all system events. It is very important to review the logs, not only to identify the fault, but also to search for events
that might have caused the fault to occur. For example, a host could lose connectivity to a disk group if a user changes channel settings
without taking the storage resources assigned to it into consideration. In addition, the type of fault can help you isolate the problem to
either hardware or software.
Isolate the fault
Occasionally, it might become necessary to isolate a fault. This is particularly true with data paths, due to the number of components
comprising the data path. For example, if a host-side data error occurs, it could be caused by any of the components in the data path:
controller module, cable, or data host.
If the enclosure does not initialize
It may take up to two minutes for all enclosures to initialize. If an enclosure does not initialize:
•
Perform a rescan
•
Power cycle the system
•
Make sure the power cable is properly connected, and check the power source to which it is connected
•
Check the event log for errors
Correcting enclosure IDs
When installing a system with drive enclosures attached, the enclosure IDs might not agree with the physical cabling order. This is because
the controller might have been previously attached to enclosures in a different configuration, and it attempts to preserve the previous
44
Troubleshooting and problem solving