
●
Generating and applying HSN routes
●
Verifying that routes can be generated for the current configuration
●
Verifying that generated routes are free of cyclic dependencies
●
Dumping out a variety of routing-related and link-related information
HSS Database
The HSS database is a MariaDB relational database that contains the state of all the physical system
components, including the System Management Workstation (SMW), Rack Controller (RC), Intelligent Subrack
Control Board (iSCB), nodes, and the Aries Network Card Controller (ANCC). The state manager reads and
writes the system state to the HSS database. The state manager keeps the database up-to-date with the current
state of components and retrieves component information from the database when needed.
Log Files
Event Logs The event router records events to the event log in the
/var/opt/cray/log/event-
yyyymmdd
file.
Log rotation takes place at specific time intervals. By default, one file is generated per day.
Dump Logs The
xtdumpsys
writes logs into the
/var/opt/cray/dump
directory by default.
SMW Logs SMW logs are stored in
/var/opt/cray/log/p0-
default
on the SMW, and include logs for
xtconsole
,
xtconsumer
,
xtnlrd
, etc.
3.4.1
Hardware Supervisory System (HSS) Architecture Overview
HSS hardware on the Urika-GX system consists of a System Management Workstation (SMW), which is a rack-
mounted Intel-based server running CentOS along with an Ethernet network that connects the SMW to a rack
controller (RC) via a switch. The RC connects to one Aries Network Card Controller (ANCC) on each dual Aries
Network Card (dANC) and consists of a mini PC running Linux. The ANCC has a 32-bit processor. Each hardware
component in the HSS system runs a version of Linux with the relevant HSS software installed. RC is used to
route data downstream from the SMW to the ANCCs and Intelligent Subrack Control Boards (iSCBs), and
upstream from the ANCCs and iSCBs to the SMW.
HSS control and monitoring is performed by the SMW over the HSS Ethernet via a stacked managed switch,
which uses VLANs to connect the SMW to the ANCCs, RC, and iSCBs.
The Urika-GX system can consist of 1, 2 or 3 sub-racks per rack, and 2 dANCs per sub-rack, resulting in a
maximum of 6 dANCs per rack. Each dANC has 2 Aries ASICs, each of which has 4 NICs to support a single
node per NIC connected by PCIe Gen 3.
HSS infrastructure software stack executes on the RC, SMW, and the ANCC to control and monitor the Aries
ASIC.
Resiliency Communication Agent (RCA)
RCA is a messaging service that connects compute nodes to the HSS event and messaging system, and allows
compute nodes to subscribe to and inject HSS events and messages.
System Management
S3016
24