data:image/s3,"s3://crabby-images/af0ac/af0acd0c535e6e5a9f3a35439f2e50935b370c8c" alt="Cray Urika-GX Administration Manual Download Page 26"
3.4.4
Hardware Supervisory System (HSS) Daemons
HSS daemons and applications exchange information with the event router. They are located
at:
/opt/cray/hss/default/bin
and are started when the System Management Workstation (SMW) boots.
They can be managed via
systemd
and can be stopped and started via
systemctl stop hss
and
systemctl start hss
respectively. HSS daemons are configured dynamically by executing the
xtdaemonconfig
command.
Key HSS daemons include:
●
State manager daemon (
state_manager
) - Performs HSS system hardware state management.
●
Event router daemon (
erd
) and (
erdh
) - Performs HSS message routing.
●
Node ID manager daemon (
nid_mgr
) - Manages node IDs and NIC addresses for every node in the system.
State Manager
HSS maintains the state of all components that it manages. The state manager,
state_manager
, runs on the
SMW and uses a relational database (also referred to as the HSS database) to maintain/store the system state.
The state manager keeps the database up-to-date with the current state of components and retrieves component
information from the database when needed. Thus, the dynamic system state persists between SMW boots. The
state manager uses the Lightweight Log Manager (LLM). The log data from state manager is written
to:
/var/opt/cray/log/sm-
yyyymmdd
. The default setting for state manager is to enable LLM logging. The
state manager performs the following functions:
●
Updates and maintains component state information
●
Monitors events to update component states
●
Detects and handles state notification upon failure
●
Provides state and configuration information to HSS applications.
The state manager performs the aforementioned tasks on behalf of:
●
System nodes
●
Aries chips
●
Aries HSN Links
●
dual Aries Network Card (dANC)
●
Rack controller (RC)
●
Intelligent Subrack Control Board (iSCB)
In summary, the state manager subscribes to and listens for HSS events, records changes of states, and shares
those states with other daemons.
The Event Router (
erd
)
HSS functions are event-driven. The event router daemon,
erd
runs on the SMW, rack controllers, and dANC
controllers. HSS commands and daemons subscribe to events and inject events into the HSS system by using
the services of the
erd
. The event router starts as each of the devices (SMW, rack controller, dANC controller)
are started.
System Management
S3016
26