12
Advanced System Diagnostics and Troubleshooting Guide
Introduction
Diagnostics: A Brief Historical Perspective
Diagnostic utility programs were created to aid in troubleshooting system problems by detecting and
reporting faults so that operators or administrators could go fix the problem. While this approach does
help, it has some key limitations:
•
It is, at its base,
reactive
, meaning a failure must occur before the diagnostic test can be used to look
for a cause for the failure.
•
It can be
time consuming
, because the ability to troubleshoot a failure successfully based on the
information provided by the diagnostics test depends greatly on the types of information reported
by the test and the level of detail in the information.
Because users of mission-critical networks and network applications are becoming increasingly
dependent on around-the-clock network access and highest performance levels, any downtime or
service degradation is disruptive and costly. Time lost to an unexpected failure, compounded by more
time lost while someone attempts to isolate and fix the failure, has become increasingly less acceptable.
The process of improving diagnostic tests to minimize failures and their impact is a kind of feedback
system: What you learn through the use of the diagnostics improves your understanding of hardware
failure modes; what you learn from an improved understanding of hardware failure modes improves
your understanding of the diagnostics.
The goal of the current generation of ExtremeWare diagnostics is to help users achieve the highest
levels of network availability and performance by providing a suite of diagnostic tests that moves away
from a reactive stance—wherein a problem occurs and then you attempt to determine what caused the
problem—to a proactive state—wherein the system hardware, software, and diagnostics work together
to reduce the total number of failures and downtime through:
•
More accurate reporting of errors (fewer false notifications; more information about actual errors)
•
Early detection of conditions that lead to a failure (so that corrective action can be taken before the
failure occurs)
•
Automatic detection and correction of packet memory errors in the system’s control and data planes
Administrators will now find a greatly reduced MTTR (mean time to repair) due to fast and accurate
fault identification. Multiple modules will no longer need to be removed and tested; faulty components
will usually be identified directly. Over time, there should be a significant reduction in the number of
problems found.
NOTE
In spite of the improved ExtremeWare hardware diagnostics, some network events might still occur,
because software is incapable of detecting and preventing every kind of failure.
Overview of the ExtremeWare Diagnostics Suite
The ExtremeWare diagnostic suite includes the following types of tools for use in detecting, isolating,
and treating faults in a switch. Each of these diagnostic types is summarized below, but is described in
greater detail in later sections of this guide.
•
Power-on self test (POST)—A sequence of hardware tests that run automatically each time the switch
is booted, to validate basic system integrity. The POST runs in either of two modes: normal (more
thorough, but longer-running test sequence) or FastPOST (faster-running basic test sequence).
Summary of Contents for ExtremeWare Version 7.8
Page 8: ...8 Advanced System Diagnostics and Troubleshooting Guide Contents...
Page 14: ...14 Advanced System Diagnostics and Troubleshooting Guide Introduction...
Page 24: ...24 Advanced System Diagnostics and Troubleshooting Guide i Series Switch Hardware Architecture...
Page 48: ...48 Advanced System Diagnostics and Troubleshooting Guide Software Exception Handling...
Page 102: ...102 Advanced System Diagnostics and Troubleshooting Guide Additional Diagnostics Tools...
Page 110: ...110 Advanced System Diagnostics and Troubleshooting Guide Troubleshooting Guidelines...
Page 120: ...120 Advanced System Diagnostics and Troubleshooting Guide Index...