
#
iscb status
iSCB Status
Node: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Total
Power: * * * * * * * * * * * * * *
ID-LED: * * * * * * * * * * * * * * * *
Console:
BMC: * * * * * * * * * * * * * * * *
Temp 'C: -65 -65 -64 -61 -63 -64 -61 -62 -63 -64 -65 -62 -64 -63
Watt W: 94 86 102 98 78 83 97 87 96 107 92 86 107 87 1300
PSU: 00 01 02 03 04 05 Total
Power: on on on on on on
Status: ok ok ok ok ok ok
Temp: 22'C 24'C 23'C 23'C 23'C 23'C
Fan: 6208,5280 6208,5280 6144,5280 6176,5248 6144,5248 6176,5312
12V: 22A 21A 21A 21A 21A 21A 127A
AC In: 211V 210V 210V 208V 207V 207V
Watt: 300W 314W 308W 310W 310W 312W 1854W
CFU: 00 01 02 03 04 05
Status: ok ok ok ok ok ok
Fan1: 3522rpm 3513rpm 3529rpm 3472rpm 3506rpm 3483rpm
Fan2: 3540rpm 3513rpm 3538rpm 3486rpm 3529rpm 3492rpm
Duty: 60% 60% 60% 60% 60% 60%
ok
4.13 Analyze Node Memory Dump Using the
kdump
and
crash
Utilities on a Node
The
kdump
and
crash
utilities may be used to analyze the memory on any Urika
®
-GX compute node. The
kdump
command is used to dump node memory to a file.
kdump
is the Linux kernel's built-in crash dump mechanism. In
the event of a kernel crash,
kdump
creates a memory image (also known as
vmcore
) that can be analyzed for
the purposes of debugging and determining the cause of a crash. Dumped image of the main memory, exported
as an Executable and Linkable Format (ELF) object, can be accessed either directly during the handling of a
kernel crash (through
/proc/vmcore
), or it can be automatically saved to a locally accessible file system, to a
raw device, or to a remote system accessible over the network.
kdump
is configured to automatically generate
vmcore
crash dumps on node crashes. These dumps can be found on the node in the crash partition, mounted to
nid000
XX
:/mnt/crash/var/crash/
datestamp
/*
, where
XX
ranges from
00-15
for a rack containing a
single sub-rack,
00-31
for a rack containing 2 sub-racks, and
00-47
for a rack containing 3 sub-racks. After
kdump
completes, the
crash
utility can be used on the dump file generated by
kdump
. The
xtdumpsys
SMW
utility can be used to extract vmcores from the cluster and store them on the SMW for crash analysis as well.
NOTE: Cray recommends executing the
kdump
utility only if a node has panicked or is hung, or if a dump
is requested by Cray.
On the Urika-GX compute nodes, kdump's system facing configuration files are set to have a kdump file
stored on a local hard drive partition that is mounted as
/mnt/crash
so the kernel crash dumps are
store in
/mnt/crash/var/crash
. Urika-GX has two local HDDs.
kdump
stores the
vmcore
collections
on one of these drives. It is advised not to modify the
/etc/kdump.conf
or
/etc/sysconfig/kdump
configuration files.
System Monitoring
S3016
102