Recovery
Issue 1 May 2002
3-5
555-233-143
Recovery
Watchdog and Applications
The Watchdog monitors the sanity of the various applications it initially created. It
does this using two mechanisms. The Watchdog receives:
■
A SIGCHLD signal if any process it created dies
■
Periodic heartbeat messages from those processes
If heartbeat messages go away for a certain time (as specified in the Watchdog’s
configuration file), the application is killed. When an application terminates (either
unintentionally died or intentionally killed), Watchdog runs an application-specific
“recovery” script that:
■
May try to kill every process in the application
■
Checks for and corrects any resource problems
■
Tries to recreate the application
The Watchdog tries to recreate the application a specified number of times. If
unsuccessful after that number of tries within the specified retry interval, the
Watchdog runs the application’s “total failure” script.
For MultiVantage, the recovery script kills every MultiVantage process. Its total
failure script kills off the MultiVantage processes and causes a Linux reboot.
Watchdog and Linux
The Watchdog also monitors several Linux services/daemons. Since the Linux
“init” process originally started these processes, Watchdog can’t use the
SIGCHLD signal to monitor these processes. Instead, Watchdog uses a thread to
periodically check the validity of the process identifier for each monitored
processes. If invalid, the Watchdog calls a Linux script to stop and then restart the
particular service. The Linux services monitored by Watchdog are:
■
atd – at daemon (runs programs at specific times)
■
crond – cron daemon (runs programs periodically)
■
dbgserv – provides debugging services
■
httpd – Apache hypertext transfer protocol server
■
inetd – Internet server daemon (provides telnet/rlogin/etc. connectivity)
■
klogd – Linux kernel log daemon (manages logging from Linux
kernel/drivers)
■
prune – monitors and cleans up partitions
Содержание S8700 Series
Страница 50: ...Maintenance Architecture 555 233 143 1 26 Issue 1 May 2002 ...
Страница 74: ...Initialization and Recovery 555 233 143 3 12 Issue 1 May 2002 ...
Страница 186: ...Alarms Errors and Troubleshooting 555 233 143 4 112 Issue 1 May 2002 ...
Страница 232: ...Additional Maintenance Procedures 555 233 143 5 46 Issue 1 May 2002 ...
Страница 635: ...status psa Issue 1 May 2002 7 379 555 233 143 status psa See status tti on page 7 406 ...
Страница 722: ...Maintenance Commands 555 233 143 7 466 Issue 1 May 2002 ...
Страница 1121: ...CARR POW Carrier Power Supply Issue 1 May 2002 8 399 555 233 143 Figure 8 19 Power Distribution Unit J58890CH 1 ...
Страница 1447: ...E DIG RES TN800 reserve slot Issue 1 May 2002 8 725 555 233 143 E DIG RES TN800 reserve slot See ASAI RES ...
Страница 1735: ...LGATE AJ Issue 1 May 2002 8 1013 555 233 143 LGATE AJ See BRI SET LGATE BD See BRI BD LGATE PT See BRI PT ...
Страница 1846: ...Maintenance Object Repair Procedures 555 233 143 8 1124 Issue 1 May 2002 Figure 8 62 TN787 MMI MULTIMEDIA INTERFACE CIRCUIT PACK ...