APPENDIX B
:
TROUBLESHOOTING
107
Capability Scanning
The capability scanning service scans individual nodes to discover which services are supported
on that node. It uses an intelligent service discovery mechanism and relies heavily upon
communication over the TCP protocol (and sometimes UDP). In its initial state, the capability
scanning service waits and listens for
suspect node
events. When a
suspect node
event occurs,
it begins scanning the node. If it finds a new service, it will generate an event to notify the other
services that a new service has been discovered. You will see these events as the first events
associated with any given node.
In addition to responding to
suspect node
events, the capability scanning service runs once per
day to check each existing node for new services. Additionally, a validated user within the CC-
NOC’s user interface can request a rescan of a device. The capability scanning daemon, whether
during initial population of the database, during the daily scans, or during a forced rescan, will
add any new services discovered to that node. However, it will
not
remove those services.
Service removal is a function of the
pollers.
Pollers
For monitoring the availability of individual services, the CC-NOC maintains a number of pollers
(or polling services). When a poller runs, it performs an intelligent test against a service to
confirm that it is responsive. The actual test varies from service to service, but most of the pollers
rely heavily on TCP communications.
Pollers run on independent schedules, depending on the node and the service they are monitoring.
The default for most pollers is to run every five minutes, unless an outage occurs. You can adjust
the polling interval from the
Admin
page, but it is strongly advised that you consider the
potential impact before making such a change. Adjusting polling intervals (they were initially set
at 5 minutes after extensive testing), timeouts and/or retries without proper planning or
forethought runs the risk of a) having the poller (s) get behind, b) adding unreasonable amounts
of network traffic in the environment, and/or c) mis-diagnosis of outages (in the case of low
retries).
If an outage occurs, the poller adjusts its scheduling to check much more frequently at first and
then less often if the outage lasts for a long period of time.
In addition to running already scheduled pollers, a service constantly runs which listens for newly
discovered services and schedules them for polling. Whenever a poller discovers an outage, it
generates an event to let the other services (and the concerned users) know that the outage
occurred.
Notifications
The notifications service listens to every event generated and (depending upon the configuration)
notifies the concerned users. These notifications are performed via email or a paging server. This
is one of the most critical services to maintain on the CC-NOC, because you will not be aware of
outages unless the notifications are reaching you. The notifications service evaluates each event
against the notifications rules you configured in the administrative interface. If it matches one or
more rules, it will perform a notification and then schedule itself for the next escalation in the
escalation path. If nobody has confirmed the notification before the scheduled time, it will notify
the next person in the escalation.
Содержание COMMANDCENTER NOC
Страница 2: ...This page intentionally left blank...
Страница 12: ...xii FIGURES...
Страница 20: ...8 COMMANDCENTER NOC ADMINISTRATOR GUIDE...
Страница 114: ...102 COMMANDCENTER NOC ADMINISTRATOR GUIDE...
Страница 132: ...120 COMMANDCENTER NOC ADMINISTRATOR GUIDE...
Страница 144: ...132 COMMANDCENTER NOC ADMINISTRATOR GUIDE...
Страница 148: ...136 COMMANDCENTER NOC ADMINISTRATOR GUIDE...
Страница 155: ...APPENDIX G NETWORK TRAFFIC OVERHEAD NETWORK MANAGEMENT S NECESSARY EVIL 143 255 80 5301 00...