Initialization and Recovery
555-233-143
3-6
Issue 1 May 2002
■
syslogd – Linux system log daemon (manages logging from Linux
services)
■
xntpd – Network Time Protocol daemon (manages clock synchronizations
across the network)
Watchdog’s HiMonitor
The Watchdog’s HiMonitor checks for run-away processes and terminates them.
HiMonitor deals with an infinitely looping process that is preventing lower-priority
processes from running. More specifically, the high-priority HiMonitor process
periodically (interval set in
watchd.conf
) looks for responses from the
low-priority LoMonitor process. If present, HiMonitor resets Watchdog’s timer. If
not, HiMonitor issues and logs a “top” command to determine which processes
are taking up CPU resources. HiMonitor then takes one of three recovery actions
in this order:
1. If a process within Watchdog’s or the Process Manager’s Linux process
group, is consuming too high a percentage (percentage set in
/etc/opt/ecs/watchd.conf
) of CPU occupancy, HiMonitor kills the
process.
2. If no process is using too high a percentage, but more than 100 instances
of the same monitored process is running, HiMonitor reboots Linux.
3. Does nothing and waits for the system to recover on its own.
If LoMonitor does not respond to a preset threshold (currently 5 of 7) of HiMonitor
checks, then (as a final recovery action) HiMonitor reboots Linux.
!
CAUTION:
Escalate to an Avaya engineer for explicit guidance with this recovery, since
it is potentially disruptive. A process can legitimately occupy abnormally
high amounts of processor time due to server load, and killing it could make
the server totally unavailable.
However, with an engineer’s guidance, recovery can be disabled by setting
the sampling-interval or occupancy-threshold values to “0.” More likely, the
sampling-interval and CPU-occupancy thresholds will need to be fine-tuned
to values that don’t cause erroneous recovery attempts.
NOTE:
The value of the sampling interval must be greater or equal to “0.” If set
to “0,” then the “top” command is not run, and no recovery is performed.
Also, the threshold CPU-occupancy percentage must be between “0” and
“100.” If set to “0,” then no recovery is performed, but the “top” command’s
output is logged. Setting these values to “0” may help achieve stability by
obtaining useful data without disrupting the processes.
Содержание S8700 Series
Страница 50: ...Maintenance Architecture 555 233 143 1 26 Issue 1 May 2002 ...
Страница 74: ...Initialization and Recovery 555 233 143 3 12 Issue 1 May 2002 ...
Страница 186: ...Alarms Errors and Troubleshooting 555 233 143 4 112 Issue 1 May 2002 ...
Страница 232: ...Additional Maintenance Procedures 555 233 143 5 46 Issue 1 May 2002 ...
Страница 635: ...status psa Issue 1 May 2002 7 379 555 233 143 status psa See status tti on page 7 406 ...
Страница 722: ...Maintenance Commands 555 233 143 7 466 Issue 1 May 2002 ...
Страница 1121: ...CARR POW Carrier Power Supply Issue 1 May 2002 8 399 555 233 143 Figure 8 19 Power Distribution Unit J58890CH 1 ...
Страница 1447: ...E DIG RES TN800 reserve slot Issue 1 May 2002 8 725 555 233 143 E DIG RES TN800 reserve slot See ASAI RES ...
Страница 1735: ...LGATE AJ Issue 1 May 2002 8 1013 555 233 143 LGATE AJ See BRI SET LGATE BD See BRI BD LGATE PT See BRI PT ...
Страница 1846: ...Maintenance Object Repair Procedures 555 233 143 8 1124 Issue 1 May 2002 Figure 8 62 TN787 MMI MULTIMEDIA INTERFACE CIRCUIT PACK ...