Cluster Troubleshooting
145
hang. After a certain amount of time, by default 360 seconds, the cluster
manager will issue a config_too_long message into the /tmp/hacmp.out
file.
The message issued looks like this:
The cluster has been in reconfiguration too long;Something may be wrong.
In most cases, this is because an event script has failed. You can find out
more by analyzing the /tmp/hacmp.out
file.The error messages in the
/var/adm/cluster.log
file may also be helpful. You can then fix the problem
identified in the log file and execute the
clruncmd
command on the command
line, or by using the
SMIT Cluster Recovery Aids
screen. The
clruncmd
command signals the Cluster Manager to resume cluster processing.
Note, however, that sometimes scripts simply take too long, so the message
showing up isn’t always an error, but sometimes a warning. If the message is
issued, that doesn’t necessarily mean that the script failed or never finished.
A script running for more than 360 seconds can still be working on something
and eventually get the job done. Therefore, it is essential to look at the
/tmp/hacmp.out file to find out what is actually happening.
7.3 Deadman Switch
The term “deadman switch” describes the AIX kernel extension that causes a
system panic and dump under certain cluster conditions if it is not reset. The
deadman switch halts a node when it enters a hung state that extends
beyond a certain time limit. This enables another node in the cluster to
acquire the hung node’s resources in an orderly fashion, avoiding possible
contention problems.
If this is happening, and it isn’t obvious why the cluster manager was kept
from resetting this timer counter, for example because some application ran
at a higher priority as the
clstrmgr
process, customizations related to
performance problems should be performed in the following order:
1. Tune the system using I/O pacing.
2. Increase the
syncd
frequency.
3. If needed, increase the amount of memory available for the
communications subsystem.
4. Change the Failure Detection Rate.
Each of these options is described in the following sections.
Содержание AIX HACMP
Страница 2: ......
Страница 10: ...viii IBM Certification Study Guide AIX HACMP...
Страница 12: ...x IBM Certification Study Guide AIX HACMP...
Страница 14: ...xii IBM Certification Study Guide AIX HACMP...
Страница 18: ...xvi IBM Certification Study Guide AIX HACMP...
Страница 24: ...6 IBM Certification Study Guide AIX HACMP...
Страница 110: ...92 IBM Certification Study Guide AIX HACMP...
Страница 133: ...HACMP Installation and Cluster Definition 115...
Страница 134: ...116 IBM Certification Study Guide AIX HACMP...
Страница 160: ...142 IBM Certification Study Guide AIX HACMP...
Страница 200: ...182 IBM Certification Study Guide AIX HACMP...
Страница 216: ...198 IBM Certification Study Guide AIX HACMP...
Страница 222: ...204 IBM Certification Study Guide AIX HACMP...
Страница 226: ...208 IBM Certification Study Guide AIX HACMP...
Страница 232: ...214 IBM Certification Study Guide AIX HACMP...
Страница 240: ...Printed in the U S A SG24 5131 00 IBM Certification Study Guide AIX HACMP SG24 5131 00...