data:image/s3,"s3://crabby-images/102e9/102e9f96614657d262e6753cdde12e1369c03517" alt="Cray Urika-GX Administration Manual Download Page 44"
●
Set the default ANCC image location/path, or the iSCB image
●
Reboot specified device (
ANCC0
,
ANCC1
,
iSCB0
or
iSCB1
)
dANC Scrub Devices
The Urika-GX system supports scrubbing of non-volatile devices on the dANC. This includes the
DANFPGA
and
dANC flash devices. The
DANFPGA FPGA
image and flash tool is included in the dANC ARM image and is flashed
after the ARM has booted Linux.
dANC Monitoring
The dANC monitoring daemon,
anccmond
executes on the Dual Aries Network Card Controller (dANCC) and
performs the following functions:
●
Monitors Aries, AOC and board temperatures.
●
Monitors the AOC and board power.
●
Sends events to HSS ERD.
●
Provides threshold warnings to iSCB via an I
2
C bus.
●
Sets up the temperature and voltages as a poll mechanism, so that the iSCB can poll for the data.
The HSS thresholds can be modified on the SMW using the HSS
xtdaemonconfig
command.
3.6
Analyze Node Memory Dump Using the
kdump
and
crash
Utilities on a Node
The
kdump
and
crash
utilities may be used to analyze the memory on any Urika
®
-GX compute node. The
kdump
command is used to dump node memory to a file.
kdump
is the Linux kernel's built-in crash dump mechanism. In
the event of a kernel crash,
kdump
creates a memory image (also known as
vmcore
) that can be analyzed for
the purposes of debugging and determining the cause of a crash. Dumped image of the main memory, exported
as an Executable and Linkable Format (ELF) object, can be accessed either directly during the handling of a
kernel crash (through
/proc/vmcore
), or it can be automatically saved to a locally accessible file system, to a
raw device, or to a remote system accessible over the network.
kdump
is configured to automatically generate
vmcore
crash dumps on node crashes. These dumps can be found on the node in the crash partition, mounted to
nid000
XX
:/mnt/crash/var/crash/
datestamp
/*
, where
XX
ranges from
00-15
for a rack containing a
single sub-rack,
00-31
for a rack containing 2 sub-racks, and
00-47
for a rack containing 3 sub-racks. After
kdump
completes, the
crash
utility can be used on the dump file generated by
kdump
. The
xtdumpsys
SMW
utility can be used to extract vmcores from the cluster and store them on the SMW for crash analysis as well.
NOTE: Cray recommends executing the
kdump
utility only if a node has panicked or is hung, or if a dump
is requested by Cray.
On the Urika-GX compute nodes, kdump's system facing configuration files are set to have a kdump file
stored on a local hard drive partition that is mounted as
/mnt/crash
so the kernel crash dumps are
store in
/mnt/crash/var/crash
. Urika-GX has two local HDDs.
kdump
stores the
vmcore
collections
on one of these drives. It is advised not to modify the
/etc/kdump.conf
or
/etc/sysconfig/kdump
configuration files.
System Management
S3016
44