Avaya CMC1 Maintenance Procedures Download Page 125

Page: 125 / 426

Server initialization, recovery, and resets

S8100 Initialization

Maintenance Procedures

125

December 2003

The Watchdog tries to recreate the application a specified number of times. If unsuccessful after that
number of tries within the specified retry interval, the Watchdog runs the application’s “total failure”
script.

For Communication Manager, the recovery script kills every Communication Manager process. Its total-
failure script kills off the Communication Manager processes and causes a Linux reboot.

Watchdog and Linux

The Watchdog monitors several Linux services/daemons. Since the Linux init process originally started
these processes, Watchdog can not use the SIGCHLD signal to monitor these processes. Instead,
Watchdog uses a thread to periodically check the validity of the process identifier for each monitored
processes. If invalid, the Watchdog calls a Linux script to stop and then restart the particular service. The
Linux services monitored by Watchdog are:

•

atd – at daemon (runs programs at specific times)

•

crond – cron daemon (runs programs periodically)

•

dbgserv – provides debugging services

•

httpd – Apache hypertext transfer protocol server (provides Web service)

•

inetd – Internet server daemon (provides telnet/rlogin/etc. connectivity)

•

klogd – Linux kernel log daemon (manages logging from Linux kernel/drivers)

•

prune – monitors and cleans up partitions

•

syslogd – Linux system log daemon (manages logging from Linux services and applications)

•

xntpd – network time protocol daemon (manages clock synchronizations across the network)

Watchdog’s HiMonitor

The Watchdog’s HiMonitor checks for run-away processes and terminates them. HiMonitor deals with an
infinitely looping process that prevents lower-priority processes from running. More specifically, the
high-priority HiMonitor process periodically looks for responses from the low-priority LoMonitor
process. If present, HiMonitor resets Watchdog’s timer. If not, HiMonitor issues and logs a top command
to determine which processes are taking up CPU resources. HiMonitor then takes one of three recovery
actions in this order:

If a process within Watchdog’s or the Process Manager’s Linux process group, is consuming too
high a percentage (percentage set in

watchd.conf

) of CPU occupancy, HiMonitor kills the

process.

If no process is using too high a percentage, but more than 100 instances of the same monitored
process is running, HiMonitor reboots Linux.

Does nothing and waits for the system to recover on its own.

If LoMonitor does not respond to a preset threshold of HiMonitor checks, then, as a final recovery action,
HiMonitor reboots Linux.

CAUTION:

Escalate to an Avaya engineer for guidance with this recovery, because it is potentially
disruptive. A process can legitimately occupy abnormally high amounts of processor time
due to server load, and killing it could make the server totally unavailable.

Avaya CMC1, Maintenance Procedures

Search results

Summary of Contents for CMC1

Reviews:

Related manuals for CMC1

Brands by name

Popular brands