IBM Storwize V7000 Troubleshooting And Maintenance Manual Download Page 90

Page: 90 / 188

When to run the recover system procedure

Attempt a recover procedure only after a complete and thorough investigation of
the cause of the system failure. Attempt to resolve those issues by using other
service procedures.

Attention:

If you experience failures at any time while running the recover

system procedure, call the IBM Support Center. Do not attempt to do further
recovery actions, because these actions might prevent IBM Support from restoring
the system to an operational status.

Certain conditions must be met before you run the recovery procedure. Use the
following items to help you determine when to run the recovery procedure:

Note:

It is important to know the number of control enclosures in the system.

When the instructions indicate that every node is checked, you must check the
status of both nodes in every control enclosure. For some system problems or Fibre
Channel network problems, you must run the service assistant directly on the node
to get its status.

Check that no node in the cluster is active and that the management IP is not
accessible from any other node. If this is the case, there is no need to recover
the cluster.

Resolve all hardware errors in nodes so that only nodes 578 or 550 are present.
If this is not the case, go to “Fix hardware errors.”

Ensure all backend-storage that is administered by cluster is present before you
run the recover system procedure.

If any nodes have been replaced, ensure that the WWNN of the replacement
node matches that of the replaced node, and that no prior system data remains
on this node (see “Procedure: Removing system data from a node canister” on
page 59).

Fix hardware errors

Before running a system recovery procedure, it is important to identify and fix the
root cause of the hardware issues.

Identifying and fixing the root cause can help recover a system, if these are the
faults that are causing the system to fail. The following are common issues which
can be easily resolved:

The node has been powered off or the power cords were unplugged.

Check the node status of every node canister that is part of this system. Resolve
all hardware errors except node error 578 or node error 550.
– All nodes must be reporting either a node error 578 or a node error 550.

These error codes indicate that the system has lost its configuration data. If
any nodes report anything other than these error codes, do not perform a
recovery. You can encounter situations where non-configuration nodes report
other node errors, such as a 550 node error. The 550 error can also indicate
that a node is not able to join a system.

– If any nodes show a node error 550, record the error data that is associated

with the 550 error from the service assistant.
- In addition to the node error 550, the report can show data that is

separated by spaces in one of the following forms:

Node identifiers in the format: <

enclosure_serial

>-<

canister slot ID

>(7

characters, hyphen, 1 number), for example,

01234A6-2

Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide