Initialization and Recovery
555-233-143
3-6
Issue 1 May 2002
■
syslogd – Linux system log daemon (manages logging from Linux
services)
■
xntpd – Network Time Protocol daemon (manages clock synchronizations
across the network)
Watchdog’s HiMonitor
The Watchdog’s HiMonitor checks for run-away processes and terminates them.
HiMonitor deals with an infinitely looping process that is preventing lower-priority
processes from running. More specifically, the high-priority HiMonitor process
periodically (interval set in
watchd.conf
) looks for responses from the
low-priority LoMonitor process. If present, HiMonitor resets Watchdog’s timer. If
not, HiMonitor issues and logs a “top” command to determine which processes
are taking up CPU resources. HiMonitor then takes one of three recovery actions
in this order:
1. If a process within Watchdog’s or the Process Manager’s Linux process
group, is consuming too high a percentage (percentage set in
/etc/opt/ecs/watchd.conf
) of CPU occupancy, HiMonitor kills the
process.
2. If no process is using too high a percentage, but more than 100 instances
of the same monitored process is running, HiMonitor reboots Linux.
3. Does nothing and waits for the system to recover on its own.
If LoMonitor does not respond to a preset threshold (currently 5 of 7) of HiMonitor
checks, then (as a final recovery action) HiMonitor reboots Linux.
!
CAUTION:
Escalate to an Avaya engineer for explicit guidance with this recovery, since
it is potentially disruptive. A process can legitimately occupy abnormally
high amounts of processor time due to server load, and killing it could make
the server totally unavailable.
However, with an engineer’s guidance, recovery can be disabled by setting
the sampling-interval or occupancy-threshold values to “0.” More likely, the
sampling-interval and CPU-occupancy thresholds will need to be fine-tuned
to values that don’t cause erroneous recovery attempts.
NOTE:
The value of the sampling interval must be greater or equal to “0.” If set
to “0,” then the “top” command is not run, and no recovery is performed.
Also, the threshold CPU-occupancy percentage must be between “0” and
“100.” If set to “0,” then no recovery is performed, but the “top” command’s
output is logged. Setting these values to “0” may help achieve stability by
obtaining useful data without disrupting the processes.
Summary of Contents for S8700 Series
Page 50: ...Maintenance Architecture 555 233 143 1 26 Issue 1 May 2002 ...
Page 74: ...Initialization and Recovery 555 233 143 3 12 Issue 1 May 2002 ...
Page 186: ...Alarms Errors and Troubleshooting 555 233 143 4 112 Issue 1 May 2002 ...
Page 232: ...Additional Maintenance Procedures 555 233 143 5 46 Issue 1 May 2002 ...
Page 635: ...status psa Issue 1 May 2002 7 379 555 233 143 status psa See status tti on page 7 406 ...
Page 722: ...Maintenance Commands 555 233 143 7 466 Issue 1 May 2002 ...