5 Troubleshooting
This chapter provides a preferred methodology (strategies and procedures) and tools for
troubleshooting server blade error and fault conditions.
Methodology
General Troubleshooting Methodology
There are multiple entry points to the troubleshooting process, dependent upon your level of
troubleshooting expertise; the tools, processes, and procedures which you have at your disposal;
and the nature of the server fault or failure.
1.
Typically, you select from a set of symptoms, ranging from very simple, server LED is blinking;
to the most difficult, Machine Check Abort (MCA) has occurred. The following is a list of
symptom examples:
•
Front Panel LED blinking
•
System Alert present on system console
•
Server blade won’t power-up
•
Server blade won’t boot
•
Error/Event Message received
•
MCA occurred
2.
Narrow down the observed problem to the specific troubleshooting procedure required. Isolate
the failure to a specific part of the server blade to perform more detailed troubleshooting. For
example:
•
Problem - Front Panel LED blinking
NOTE:
The front panel health LED flashes amber with a warning indication, or flashes
red with a fault indication.
◦
System Alert on system console?
◦
Analyze the alert by using the system event log (SEL), to identify the last error logged
by the server blade. Use the iLO 2 MP commands to view the SEL, through the MP’s
text interface.
3.
You should have a good idea about which area of the server blade requires further analysis.
For example, if the symptom was “server blade won’t power-up”, the initial troubleshooting
procedure may have indicated a problem with the DC power rail not coming up after the
power was turned on.
4.
You have now reached the point where the failed Field Replaceable Unit (FRU or FRUs) has
been identified and needs to be replaced. You must now perform the specific removal and
replacement procedure, and verification steps (see
Chapter 6: “Removing and Replacing
Components” (page 108)
).
NOTE:
If multiple FRUs are identified as part of the solution, fix all identified failed FRUs to
guarantee success.
5.
There may be specific recovery procedures you need to perform to finish the repair.
86
Troubleshooting