72
Advanced System Diagnostics and Troubleshooting Guide
Diagnostics
where:
Alpine or Summit “
i
” series Switches.
To configure the switch to respond to a failed health check by
attempting to perform auto-recovery (packet memory scanning and mapping), use this command:
configure sys-health-check alarm-level auto-recovery [offline | online]
When system health checks fail at a specified frequency, packet memory scanning and mapping is
invoked automatically. Once auto-recovery mode is configured, an automated background polling task
checks every 20 seconds to determine whether any fabric checksums have occurred. To invoke the
automatic memory scanning feature, three consecutive samples for a module must be corrupted. When
the automatic memory scanning feature has been invoked—regardless of platform type or number of
errors, the device is taken offline for an initial period so that the memory scan can be performed.
Backplane Health Check
On BlackDiamond switches, the backplane health check routine in the system health checker tests the
communication path between the CPU subsystem on the MSM and all I/O modules in the chassis for
data-path packet errors. It does this by causing the CPU subsystem on the MSM to generate a backplane
health check packet that is sent across the chassis backplane to each backplane I/O module link.
Viewing Backplane Health Check Results—show log Command
The backplane health check uses the same system log reporting mechanism as checksum validation, so
you can use the
show log
command to view health check status information. Log messages take the
following form (date, time, and severity level have been omitted to focus on the key information):
Sys-health-check [
type] checksum error on <slot> prev= <0xm> cur= <0xn>
where
type
indicates the health check test packet type (INT, EXT, CPU), and
slot
indicates the probable
location of the error, from among the following:
•
M-BRD—The main board of a Summit system.
•
BPLANE—The backplane of an Alpine system.
•
MSM-A, MSM-B, MSM-C, or MSM-D—The MSM modules of a BlackDiamond system.
•
Slot
n
—The slot number for an I/O module in a BlackDiamond system.
When you have observed log messages indicating missed or corrupted health check packets, use the
show diagnostics
command as the next source of information about health check failures.
number of tries
Specifies the number of times that the health checker attempts to auto-recover a faulty
module. The range is from 0 to 255 times. The default is 3 times.
offline
Specifies that a faulty module is to be taken offline and kept offline if one of the
following conditions is true:
•
More than eight defects are detected.
•
No new defects were found by the memory scanning and mapping process.
•
The same checksum errors are again detected by the system health checker.
online
Specifies that a faulty module is to be kept online, regardless of memory scanning or
memory mapping errors.
Содержание ExtremeWare Version 7.8
Страница 8: ...8 Advanced System Diagnostics and Troubleshooting Guide Contents...
Страница 14: ...14 Advanced System Diagnostics and Troubleshooting Guide Introduction...
Страница 24: ...24 Advanced System Diagnostics and Troubleshooting Guide i Series Switch Hardware Architecture...
Страница 48: ...48 Advanced System Diagnostics and Troubleshooting Guide Software Exception Handling...
Страница 102: ...102 Advanced System Diagnostics and Troubleshooting Guide Additional Diagnostics Tools...
Страница 110: ...110 Advanced System Diagnostics and Troubleshooting Guide Troubleshooting Guidelines...
Страница 114: ...114 Advanced System Diagnostics and Troubleshooting Guide Limited Operation Mode and Minimal Operation Mode...
Страница 120: ...120 Advanced System Diagnostics and Troubleshooting Guide Index...