Fault isolation methodology
The basic methodology used to locate faults within a storage system, and to identify the pertinent CRUs
affected.
Overview
Basic steps:
• Gather fault information, including using system LEDs.
• Determine where in the system the fault is occurring.
• Review logs from the ClevOS Manager event console.
• If required, isolate the fault to a data path component or configuration as described in “Isolate the
Gather fault information
When a fault occurs, it is important to gather as much information as possible. Doing so will help you
determine the correct action needed to remedy the fault.
Begin by reviewing the reported fault:
• Is the fault related to an internal data path or an external data path?
• Is the fault related to a hardware component such as a disk drive module, controller module, or power
supply unit?
By isolating the fault to one of the components within the storage system, you will be able to determine
the necessary corrective action more quickly.
Determine where the fault is occurring
When a fault occurs, the Module Fault LED - located in the lower left corner of the enclosure front panel -
check the LEDs on the back and top panels (must remove a lid) of the enclosure to narrow the fault to a
CRU, connection, or both.
• See “Rear panel LEDs” on page 16.
• See “Top panel LEDs” on page 21.
The LEDs help you identify the location of a CRU reporting a fault.
Isolate the fault
Occasionally, it might become necessary to isolate a fault. This is particularly true with data paths, due to
the number of components comprising the data path. For example, if a host-side data error occurs, it
could be caused by any of the components in the data path: Controller node HBA, Cable, IOM, or Disk
Enclosure.
If the enclosure does not initialize
It may take up to two minutes for all enclosures to initialize. If an enclosure does not initialize:
• Power cycle the system.
• Make sure the power cord is properly connected, and check the power source to which it is connected.
• Check the ClevOS Manager event console for errors.
Host I/O
When troubleshooting disk drive and connectivity faults, stop I/O to the affected disk groups from all
hosts as a data protection precaution. See also the “Stopping I/O” on page 67 section.
52 IBM Cloud Object Storage System: Medium/Large J11/J12 Disk Enclosure Hardware Installation and
Maintenance Manual
Summary of Contents for 4957-J11
Page 15: ...European Community and Morocco Notice Germany Notice Safety and environmental notices xv...
Page 103: ......
Page 104: ...IBM...