This soft copy for use by IBM employees only.
/usr/lpp/ssp/bin/hrd
The inittab entry:
hr:2:once:/usr/bin/startsrc -g hr >/dev/null 2>/dev/console
If it has died or did not start on boot, run:
# startsrc -g hr
The host_responds daemon (like the SDR daemon) only runs on the Control
Workstation. Remember that if the problematic node is running AIX Version
3.2.5, then check for the
ccst process instead of the hbd process.
If all the daemons are running, it is still worth recycling them all just in case any
of the daemons are no longer functioning as they should.
If there is no connectivity across
en0, then test the connectivity across the other
network interfaces to the problem node. Test to see if a console can be opened
across the serial link. If there is connectivity across other interfaces, and the
node is up and working, then resolve the network issues with the
en0 interface.
Here are some possible scenarios:
•
An application or process is generating a large amount of traffic across
en0
so that the heartbeat times out
−
The internal Ethernet should ideally be dedicated to
SP-related traffic for
this reason.
•
The interface has gone down or been corrupted
−
Check with the
ifconfig
command and remove and rebuild the interface
with the
rmdev
and
mkdev
commands if necessary.
If there is no connectivity across any of the interfaces, then check to see what
the status is of any users that are already logged in. If their sessions are
hung
(that is, if there is no response to typing anything on the keyboard), then check
the
LED display for the node using the
spmon
GUI.
If the node has crashed, then there will be either an
888 LED displayed, or an
associated crash code (such as
0c9, for example). In this case, reboot the node
after ensuring that it has finished taking a system dump and contact IBM support
for assistance in analyzing the dump data.
If the node shows no sign of having crashed but there is absolutely no response
at all to any commands, then force a system dump (according to the instructions
given in Chapter 8, “Producing a System Dump” on page 201), reboot the node,
and contact IBM support.
If the node responds to commands but does so in a fashion that is slower than
normal, then it may be possible to analyze the performance at this time to
identify what might be causing this problem.
5.5.2 The Heartbeat after System Partitioning
During the
apply of the partitions, multiple heartbeat and host_responds
daemons get created and started. The
spapply_config
command uses the
information in the
Syspar_map object class to identify which subsystems to build.
Each partition has one heartbeat and one host_responds daemon. Unlike the
SDR daemons, it is not possible to identify which daemon services which
154
SP PD Guide
Summary of Contents for RS/6000 SP
Page 2: ......
Page 14: ...This soft copy for use by IBM employees only xii SP PD Guide...
Page 16: ...This soft copy for use by IBM employees only xiv SP PD Guide...
Page 106: ...This soft copy for use by IBM employees only 86 SP PD Guide...
Page 178: ...This soft copy for use by IBM employees only 158 SP PD Guide...
Page 214: ...This soft copy for use by IBM employees only 194 SP PD Guide...
Page 248: ...This soft copy for use by IBM employees only 228 SP PD Guide...
Page 290: ...This soft copy for use by IBM employees only 270 SP PD Guide...
Page 292: ...This soft copy for use by IBM employees only 272 SP PD Guide...
Page 300: ...This soft copy for use by IBM employees only 280 SP PD Guide...
Page 304: ...This soft copy for use by IBM employees only 284 SP PD Guide...
Page 308: ...This soft copy for use by IBM employees only 288 SP PD Guide...
Page 310: ...This soft copy for use by IBM employees only 290 SP PD Guide...
Page 316: ...IBML This soft copy for use by IBM employees only Printed in U S A SG24 4778 00...