8 Monitoring cluster operations
This chapter describes how to monitor the operational state of the cluster and how to monitor cluster
health.
Monitoring the status of file serving nodes
The dashboard on the management console GUI displays information about the operational status
of file serving nodes, including CPU, I/O, and network performance information.
To view status from the CLI, use the
ibrix_server -l
command. This command provides CPU,
I/O, and network performance information and indicates the operational state of the nodes, as
shown in the following sample output:
<installdirectory>/bin/ibrix_server -l
SERVER_NAME STATE CPU(%) NET_IO(MB/s) DISK_IO(MB/s) BACKUP HA
----------- ------------ ------ ------------ ------------- ------ --
node1 Up, HBAsDown 0 0.00 0.00 off
node2 Up, HBAsDown 0 0.00 0.00 off
File serving nodes can be in one of three operational states: Normal, Alert, or Error. These states
are further broken down into categories that are mostly related to the failover status of the node.
The following table describes the states.
Description
State
Up:
Operational.
Normal
Up-Alert:
Server has encountered a condition that has been logged. An event will appear in the Status
tab of the management console GUI, and an email notification may be sent.
Up-InFailover:
Server is powered on and visible to the management console, and the management
console is failing over the server’s segments to a standby server.
Up-FailedOver:
Server is powered on and visible to the management console, and failover is complete.
Alert
Down-InFailover:
Server is powered down or inaccessible to the management console, and the
management console is failing over the server's segments to a standby server.
Down-FailedOver:
Server is powered down or inaccessible to the management console, and failover is
complete.
Down:
Server is powered down or inaccessible to the management console, and no standby server is
providing access to the server’s segments.
Error
The STATE field also reports the status of monitored NICs and HBAs. If you have multiple HBAs
and NICs and some of them are down, the state will be reported as HBAsDown or NicsDown.
Monitoring cluster events
X9000 Software events are assigned to one of the following categories, based on the level of
severity:
•
Alerts.
A disruptive evens that can result in loss of access to file system data. For example, a
segment is unavailable or a server is unreachable.
•
Warnings.
A potentially disruptive condition where file system access is not lost, but if the
situation is not addressed, it can escalate to an alert condition. Some examples are reaching
a very high server CPU utilization or nearing a quota limit.
•
Information.
An event that changes the cluster (such as creating a segment or mounting a file
system) but occurs under normal or nonthreatening conditions.
Monitoring the status of file serving nodes
47