920100006
A sensor on a node indicates an elevated temperature. Drives are overheating. The node will reboot immediately. Drive power
will discontinue in five minutes.
Description
The node will reboot but will not rejoin the cluster until temperatures are within acceptable thresholds. If not cooled within five
minutes, the drives will stay powered down until the inlet temperature is at an acceptable level to restart.
Ambient temperature is only measured by front panel sensors. If you receive an event that indicates that the front panel is out
of specification, the temperature in your data center might need to be adjusted.
If a node is subjected to high temperatures for an extended period of time, the CPU is throttled and the node goes into read
only-mode to help prevent potential data loss due to component failure. If the node temperature reaches critical levels, it is
possible that the node will shut down entirely.
Administrator action
Perform the following steps in the order listed. If the issue resolves after a step, there is no need to complete the subsequent
steps.
●
(HD400 only) Make sure that the drive drawer is properly shut by sliding it out and re-closing it firmly but carefully.
●
Review the temperature statistics for the affected sensor, which are included in the event. If the temperature is consistently
elevated, the problem is likely a high ambient temperature in the data center. Address any changes in the cluster
environment such as air conditioning outages.
●
Verify that air flow within the rack, and through the front and rear panel vents of the node, is not obstructed in any way.
●
Make sure that the faceplate on the affected node is installed, properly seated, and undamaged. In some cases, removing
and re-seating the faceplate will resolve this issue.
●
Run the
isi_hw_status
command. Review the output to determine whether there is a slow or failed fan that was not
otherwise reported.
●
Check for high CPU and disk usage in the node. High usage can contribute to high temperatures within the node.
If the steps above were unsuccessful in clearing this event, the subsystem that monitors the health of the hardware (such as
the temperature and fan speeds) might have encountered a problem. This event can occur intermittently without harm to the
system and you can safely quiet the event unless the issue persists.
If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to
gather cluster logs, see
.
920100007
All drives in the node are powering down.
Description
This event occurs if the five minute warning has not been cleared from events 920100003, 920100004, or 920100006.
Administrator action
Address the events that occurred before the drives were powered down.
If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to
gather cluster logs, see
.
204
Hardware events