
●
Rack controllers
●
HSS daemons
●
HSS database
●
Various logs
HSS performs a number of administrative tasks, such as:
●
Monitoring certain hardware system components
●
Managing hardware and software failures
●
Starting up and shutting down nodes
●
Managing the High Speed Network (HSN)
●
Maintaining system component states
●
Managing the hardware inventory
HSS Command Line Interface
HSS has a command-line interface to manage and view the system from the SMW. For complete usage
information, see the
xtcli(8)
man page.
Dual Aries Network Card (dANC) Controllers and Rack Controllers
A dANC control processor is hierarchically the lowest component of the monitoring system. The dANC Cray
network card contains two Aries ASICs and an ANCC. There are 2 dANC cards per sub-rack, and hence 4 Aries
ASICs, which support 16 nodes. The dANC monitors the general health of components, including items such as
voltages, temperature, and various failure indicators. A version of Linux optimized for embedded controllers runs
on each dANC controller.
Each rack has a rack control processor (rack controller) that monitors and controls the rack power and
communicates with all dANC controllers in the rack. It sends a periodic heartbeat to the SMW to indicate rack
health.
The rack controller connects to the dANC controllers via the Ethernet switch on each blade by an Ethernet cable
and routes HSS data to and from the SMW. RC runs the same version of embedded Linux as the dANCs. The
SMW, rack controllers, iSCBs, and ANCCs are all interconnected via Ethernet
The monitoring system uses periodic heartbeats. Processes send heartbeats within a time interval. If the interval
is exceeded, the system monitor generates a fault event that is sent to the state manager. The fault is recorded in
the event log, and the state manager sets an alert flag for the component (dANC controller or rack controller) that
spawned it.
The rack and dANC controllers use NTP to keep accurate time with the SMW.
HSS Daemons
HSS daemons on the SMW and the controllers act to monitor and control the state of the system, and to respond
to incidents such as hardware failures. The data path between the HSS CLI and the various daemons is via HSS
events.
Cray System Network Routing Utility
The
rtr
command performs a variety of routing-related tasks for the High Speed Network (HSN). Tasks include:
System Management
S3016
23