50
Advanced System Diagnostics and Troubleshooting Guide
Diagnostics
Some diagnostic tests, such as the slot-based hardware diagnostics (including the packet memory scan),
for example, can be run
on demand
through user CLI commands. Other tests can be run on demand by
user CLI commands and can also be configured to observe specific user-selected settings.
All of the ExtremeWare diagnostic tests can be coordinated under the umbrella of the ExtremeWare
system health check feature, which runs automatic background checks to detect packet memory errors
and take automatic action when errors are found. The system health check feature is enabled by a CLI
command on a switch-wide basis, and the operating parameters and failure responses of the various
diagnostic subsystems can be configured through CLI commands. As a diagnostic system, the system
health check tests try to detect and resolve possible problem situations before they become a problem,
using the diagnostic subsystems in a manner that parallels operator-initiated tests in manual mode.
Operating in manual mode, when the system log reports errors or failures, you would run the
appropriate diagnostic test set to isolate the source of the problem. Depending on the nature of the
diagnostic test (suppose the diagnostic test takes the module or switch offline while the diagnostic test
runs), you must be aware of the downtime impact when you run the diagnostic test.
Operating in automatic mode, the proactive nature of the system health checker and its diagnostic test
subsystems means that a module or switch might be taken offline automatically when an error is
detected, possibly resulting in extensive network outages.
The key to effective diagnostic use in optimizing network availability lies in understanding what
happens in the switch when a given test is run.
How the Test Affects the Switch
The impact a diagnostic test has on switch operation is determined by the following characteristics:
•
Invasive versus non-invasive
•
Passive versus active
•
Control path versus data path
Some diagnostic tests are classed as
invasive
diagnostics, meaning that running these diagnostics
requires the switch to be partly or entirely offline (system traffic is interrupted). Some invasive
diagnostics take down just the module in the targeted chassis slot for the duration of the test; some
diagnostics take down the entire switch. Some invasive diagnostics can be invoked manually through a
CLI command; other diagnostics can be configured as part of the system health checks to become active
only when certain kinds of packet memory errors occur within configured window periods or at
configured threshold levels.
Other diagnostic tests are classed as
non-invasive
, meaning that running them does not take the switch
offline, but that they can still affect the overall performance, depending on whether the non-invasive
test is a
passive
test or an
active
test, and uses the
management
or
control
bus (see
Figure 10
) or the
data
bus during the test:
•
In a
passive
test, the test merely scans switch traffic (packet flow) for packet memory errors.
•
In an
active
test, the test originates test messages (diagnostic packets) that it sends out and then
validates to verify correct operation.
Figure 9
and
Figure 10
show the same simplified BlackDiamond architecture block diagram of an I/O
module, backplane links, and MSM, but each illustrates how different tests use the management bus
(also referred to as the CPU control bus or the slow path) and the data bus (or fast path).
Summary of Contents for ExtremeWare Version 7.8
Page 8: ...8 Advanced System Diagnostics and Troubleshooting Guide Contents...
Page 14: ...14 Advanced System Diagnostics and Troubleshooting Guide Introduction...
Page 24: ...24 Advanced System Diagnostics and Troubleshooting Guide i Series Switch Hardware Architecture...
Page 48: ...48 Advanced System Diagnostics and Troubleshooting Guide Software Exception Handling...
Page 102: ...102 Advanced System Diagnostics and Troubleshooting Guide Additional Diagnostics Tools...
Page 110: ...110 Advanced System Diagnostics and Troubleshooting Guide Troubleshooting Guidelines...
Page 120: ...120 Advanced System Diagnostics and Troubleshooting Guide Index...