background image

900060010

A voltage component is out of specification.

Administrator action

Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs,
see 

Gathering cluster logs

.

900060011

A voltage component is out of specification.

Administrator action

Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs,
see 

Gathering cluster logs

.

900060012

A voltage component is out of specification.

Administrator action

Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs,
see 

Gathering cluster logs

.

900060013

A voltage component is out of specification.

Administrator action

Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs,
see 

Gathering cluster logs

.

900060014

A voltage component is out of specification.

Administrator action

Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs,
see 

Gathering cluster logs

.

Hardware events

137

Summary of Contents for PowerScale OneFS

Page 1: ...PowerScale OneFS Event Reference Guide April 2022 ...

Page 2: ...mage to hardware or loss of data and tells you how to avoid the problem WARNING A WARNING indicates a potential for property damage personal injury or death 2017 2022 Dell Inc or its subsidiaries All rights reserved Dell Technologies Dell and other trademarks are trademarks of Dell Inc or its subsidiaries Other trademarks may be trademarks of their respective owners ...

Page 3: ...e id 19 Modify alerts by event type id 20 View alerting rules 20 Create an alerting rule 20 Modify an alerting rule 20 Delete an alerting rule 21 Managing channels 21 View alert channels 21 Create a channel 21 Modify a channel 22 Delete a channel 23 Managing event thresholds 23 View events with configurable thresholds and adjust the threshold values 23 Maintenance and testing 24 Event data retenti...

Page 4: ...00010009 37 100010010 37 100010011 38 100010012 38 100010013 38 100010014 38 100010015 39 100010016 39 100010017 39 100010018 40 100010019 40 100010020 40 100010021 41 100010022 41 100010023 41 100010024 42 100010025 42 100010026 42 100010027 42 100010028 43 100010029 43 100010030 43 100010031 43 100010032 44 100010033 44 100010034 44 100010035 45 100010036 45 100010037 45 100010038 46 100010039 4...

Page 5: ...3 53 100030001 53 200010001 53 200010002 54 200010003 54 200010006 55 200010007 55 200010008 55 200010009 56 200020001 56 200020002 57 200020003 57 200020004 58 200020005 58 200020006 59 200020007 59 200020008 59 200020009 60 200020010 60 200020011 60 200020012 61 200020013 61 200020014 62 200020015 62 200020020 62 200020021 63 200020022 63 200020023 63 200020024 64 200020025 64 200020026 64 20003...

Page 6: ...2 70 400040014 70 400040015 71 400040017 71 400040018 72 400040019 72 400040020 73 400040021 73 400040022 73 400040023 74 400040024 74 400040025 74 400040026 74 400050001 75 400050002 75 400050004 75 400060001 76 400060002 76 400060004 76 400060101 76 400060102 77 400060103 77 400060104 77 400060105 77 400060106 78 400060107 78 400060108 78 400060109 78 400060110 79 400060111 79 400060112 79 40006...

Page 7: ...0 84 400100011 84 400110001 84 400120001 85 400130001 85 400130002 85 400140001 85 400140002 86 400140003 86 400150001 86 400150002 87 400150003 87 400150004 87 400150005 87 400150006 88 400150007 88 400150008 88 400150009 89 400150010 89 400150011 89 400150012 90 400151001 90 400160001 90 400160002 91 400160005 91 400170001 91 400170002 91 400180001 92 400180002 92 400180003 92 400180004 92 40018...

Page 8: ...00 98 400260000 98 500010001 98 500010002 98 500010003 99 500010004 99 500010005 99 600010001 99 600010002 100 600010003 101 600010004 101 600010005 101 700010001 102 700010003 102 700010004 102 700010005 103 700020001 103 700020002 103 700020003 104 700030001 104 700030002 104 700030003 105 700030004 105 700030005 105 700030006 106 700040001 106 700050001 106 700100001 107 800010002 107 800010003...

Page 9: ...010001 117 900010002 118 900010003 118 900010004 118 900010005 118 900010006 119 900010007 119 900010008 119 900010009 120 900010010 120 900010011 120 900010012 120 900010013 121 900020001 121 900020002 122 900020003 122 900020004 122 900020005 123 900020006 123 900020007 123 900020008 124 900020009 124 900020010 124 900020011 124 900020012 125 900020013 125 900020014 125 900020015 125 900020016 1...

Page 10: ...060004 135 900060005 135 900060006 136 900060007 136 900060008 136 900060009 136 900060010 137 900060011 137 900060012 137 900060013 137 900060014 137 900060015 138 900060016 138 900060017 138 900060018 138 900060019 138 900060020 139 900060021 139 900060022 139 900060023 140 900060024 141 900060025 141 900060026 142 900060027 143 900060028 144 900060029 144 900060030 144 900060031 144 900060032 1...

Page 11: ...080014 153 900080015 153 900080016 153 900080017 153 900080018 154 900080019 154 900080020 154 900080021 154 900080022 154 900080023 155 900080024 156 900080025 157 900080026 158 900080027 159 900080028 159 900080029 160 900080030 161 900080031 161 900080032 162 900080033 162 900080034 163 900080035 164 900080036 164 900080037 164 900100001 165 900100004 165 900100018 165 900100019 165 900100020 1...

Page 12: ...120005 175 900130001 176 900130002 176 900130003 177 900130004 177 900130005 177 900130006 178 900130007 178 900130008 178 900130009 178 900130010 178 900130011 179 900130013 179 900130014 180 900130015 181 900140001 182 900140002 182 900140003 182 900140004 182 900140005 183 900150001 183 900160001 183 900160002 184 900160003 184 900160004 184 900160005 185 900160006 185 900160007 185 900160008 1...

Page 13: ...180001 193 900180002 193 900180003 193 900180004 194 900180005 194 900180006 194 900180007 194 900180008 195 900180009 195 900180010 195 900180011 196 900180012 196 900180013 196 900180014 197 900180015 197 900180016 197 900180028 198 900180029 198 900180030 198 900180031 199 900180032 199 910100001 199 910100002 199 910100003 200 910100004 200 910100005 200 910100006 201 910100007 201 920100000 2...

Page 14: ...203 920100005 203 920100006 204 920100007 204 920100008 205 920100009 205 930100000 205 930100001 205 930100002 206 930100003 206 930100004 206 930100005 207 930100006 207 940100001 207 940100002 208 14 Contents ...

Page 15: ...zed web based and command line administration to manage the following features A cluster that runs a distributed file system Scale out nodes that add capacity and performance Storage options that manage files and tiering Flexible data protection and high availability Software modules that control costs and optimize resources Where to get help The Dell Technologies Support site https www dell com s...

Page 16: ... dell com support incidents online en us contactus product isilon onefs PowerScale OneFS Documentation Info Hubs https www dell com support kbdoc en us 000152189 powerscale onefs info hubs Dell Community Board for self help https www dell com community 16 Introduction to this guide ...

Page 17: ...d alerts at the event group level Event groups overview Event groups are collections of individual events that are related symptoms of a single situation on your cluster Event groups provide a single point of management for multiple event instances that are generated in response to a situation on your cluster For example if a chassis fan fails in a node OneFS might capture multiple events related ...

Page 18: ...ring in the search box 2 In the Actions column of the event group you want to view click View event details You can view details of each event group in a separate window Change the status of an event group You can ignore or resolve an event group After you resolve an event group you cannot reverse that action Any new events that would have been added to the resolved event group will be added to a ...

Page 19: ... maintenance mode You can disable the CELOG maintenance mode While disabling you can view all the events that have occurred during the maintenance mode and clear the details if needed 1 Click Cluster Management Events and Alerts Alert Management 2 Click Disable CELOG maintenance mode The Disable CELOG maintenance mode dialog box with the following details appear CELOG maintenance window start date...

Page 20: ...ist Send an alert only if the event lasts longer than Enter the numerical value in the text box and select the unit of time from the drop down list Applies to Select the check box next to the relevant alert category Add event group ID Click Add event group ID to add an event group Select alert channel for this rule Select the check box next to the relevant channel name 5 Click Create rule Modify a...

Page 21: ...e channel from the Channel type list NOTE Depending on the delivery mechanism you select different settings appear 7 If you are creating an SMTP channel you can configure the following settings a In the Send to field enter an email address that you want to receive alerts on this channel To add another email address to the channel click Add another email address b To manually configure the SMTP ser...

Page 22: ...this channel To add another excluded node to the channel click Exclude another node 10 Click Create channel Modify a channel You can modify a channel that you have created 1 Click Cluster Management Events and Alerts Alert Management 2 In the CELOG alerting area click the Alert channel tab 3 In the Actions column of the channel you want to modify click Edit channel The Edit alert channel window ap...

Page 23: ...e number of a node in the cluster that is allowed to send alerts through this channel To add another allowed node to the channel click Add another node If you do not specify any nodes all nodes in the cluster are considered as allowed nodes d In the Excluded Nodes field type the node number of a node in the cluster that is not allowed to send alerts through this channel To add another excluded nod...

Page 24: ...ntenance settings 1 Click Cluster Management Events and Alerts Settings 2 In the Retain event group and maintenance window history field enter the number of days you want resolved event groups and maintenance window history to be stored before they are deleted 3 In the Event log storage limit field enter the limit for the amount of storage you want to set aside for event data The value in this fie...

Page 25: ...must have root access to run log gathering commands To gather the log files in OneFS 9 0 0 0 and earlier run the following command isi_gather_info To gather the log files in OneFS 9 1 0 0 and later run the following command isi diagnostics gather start The files generated during the log gathering process are stored on the cluster in the ifs data Isilon_Support pkg directory Manually send cluster l...

Page 26: ...Enter the following settings to connect to the FTP server Host ftp emc com User name anonymous Password your email address 5 Change the destination directory to incoming 6 Upload the log file 26 Introduction to system events ...

Page 27: ...send you an email if a critical file system event occurs that might potentially cause the cluster to be unwritable Event notification methods You can define the method by which OneFS delivers notifications Email You can send email messages to distribution lists and apply email templates to notifications You can also specify SMTP authorization and security settings SNMP trap You can send SNMP traps...

Page 28: ...nt notification rules and details about specific rules 1 Click Cluster management Events and alerts Alert management 2 On the Alerting rule tab in the Actions column click Edit rule of the rule whose settings you want to view 3 When you have finished viewing the rule details click Cancel Modify an event notification rule You can modify event notification rules that you created System event notific...

Page 29: ...mplate Sends the email notifications in the format that you defined in your custom template file View event notification settings You can view email and contact information for event notifications Click Cluster management General settings Email settings Modify event notification settings You can modify email and contact settings for event notifications 1 Click Cluster management General settings E...

Page 30: ... 100010010 100010011 100010012 100010013 100010014 100010015 100010016 100010017 100010018 100010019 100010020 100010021 100010022 100010023 100010024 100010025 100010026 100010027 100010028 100010029 100010030 100010031 100010032 100010033 100010034 100010035 100010036 100010037 100010038 100010039 100010041 100010042 100010043 100010044 100010045 100010046 4 30 Software events ...

Page 31: ...1 200010002 200010003 200010006 200010007 200010008 200010009 200020001 200020002 200020003 200020004 200020005 200020006 200020007 200020008 200020009 200020010 200020011 200020012 200020013 200020014 200020015 200020020 200020021 200020022 200020023 200020024 200020025 200020026 200030001 200030002 300010001 300010002 300010003 300020001 300020002 300020003 400020001 Software events 31 ...

Page 32: ...2 400040023 400040024 400040025 400040026 400050001 400050002 400050004 400060001 400060002 400060004 400060101 400060102 400060103 400060104 400060105 400060106 400060107 400060108 400060109 400060110 400060111 400060112 400060113 400070004 400070005 400070006 400070007 400080001 400090001 400090002 400090003 400090004 400100001 400100002 400100003 400100004 400100005 32 Software events ...

Page 33: ...6 400150007 400150008 400150009 400150010 400150011 400150012 400151001 400160001 400160002 400160005 400170001 400170002 400180001 400180002 400180003 400180004 400180005 400190001 400200001 400200002 400210001 400210002 400210003 400210004 400210005 400210006 400210007 400210008 400220000 400230001 400240000 400240001 400240002 400240003 400240004 400240005 400250000 Software events 33 ...

Page 34: ...04 700010005 700020001 700020002 700020003 700030001 700030002 700030003 700030004 700030005 700030006 700040001 700050001 700100001 800010002 800010003 800010004 800010005 800010006 800010007 800010008 800010009 800010010 1100000001 1100000002 1100000003 1100000004 1100000005 1100000006 1100000007 1100000008 1100000009 34 Software events ...

Page 35: ...00010002 The var crash partition on a node is at or near capacity Description The purpose of the var crash partition is to preserve data about failed processes and unplanned restarts to enable analysis of those events This event is usually the result of a process or service stopping unexpectedly and producing a core file which is a type of log file Core files record all of the system events when t...

Page 36: ...first appears when the amount of data on the partition reaches the warning threshold of 88 of the partition capacity The message appears again when the amount of data reaches the critical threshold of 95 of the partition capacity Administrator action Reduce the amount of data that is stored on the cluster or contact your sales representative to discuss your capacity needs 100010005 The serially at...

Page 37: ...upport for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 100010008 The serially attached SCSI SAS PHY monitor detected an excessive bit error rate and disabled traffic on the SAS cables Administrator action If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather clus...

Page 38: ...failed contact Dell EMC PowerScale Technical Support for additional troubleshooting 100010012 The disk has stalled and the disk health is being evaluated Administrator action This message is informational No action is required 100010013 There is an error in a disk sector Description If enough disk errors occur the drive is automatically smartfailed If a disk smartfails and should be replaced anoth...

Page 39: ...ersists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 100010016 The diskpool metadata has been written to more SSDs than is allotted in the layout preferences Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cl...

Page 40: ... Dell EMC Online Support site If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 100010020 A serially attached SCSI SAS controller logged an error or a change in the disk subsystem Description When disk drives go into error recovery they hold the PHY connection open The LSI 20...

Page 41: ...mum Bit Error Rate BER Description When disk drives go into error recovery they hold the PHY connection open The LSI 2008 SAS controllers can time out on that open connection When the LSI controller reaches its timeout threshold the SAS connection is reset which causes the SAS BER error messages and event notifications Administrator action Verify that the node firmware and drive firmware are up to...

Page 42: ...on 1 Smartfail the specified drive Do not remove the drive from the node 2 Reinstall the smartfailed drive by running the following command isi devices drive format bay node lnn integer If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 100010027 The drive subsystem determined...

Page 43: ... logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 100010030 An unsupported drive was installed in the node Description This event indicates that an unsupported drive was installed in the node Administrator action Remove the unsupported drive from the affected node and contact Dell EMC PowerScale Technic...

Page 44: ... drive from another node in the cluster was inserted as a replacement Description A drive that was installed as a replacement was used previously in the cluster The event message provides you with the current node bay location drive type and Logical Number LNUM of the drive Administrator action Take one of the following actions to resolve the issue If the drive was inserted in error replace the dr...

Page 45: ...urrent node bay location drive type and Logical Number LNUM of the drive Administrator action Replace the drive with a new drive according to the instructions in the PowerScale Drive Replacement Guide for your platform 100010037 A new drive was not formatted correctly and as a result the drive was not added to the filesystem Description A drive that was installed in a node was not formatted correc...

Page 46: ...rive in the drive bay and wait for the drive to finish Smartfailing or leave the bay empty until the FlexProtect job finishes 100010039 Unprovisionable drive s unprovisionable Description One or more drives in the cluster cannot be provisioned Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gather...

Page 47: ...onal troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 100010044 A node boot flash drive must be replaced Description Boot flash drives are not customer replaceable parts Replacement of this device will require turning off power to the node Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on ...

Page 48: ...100010050 The smartfail process completed on a drive Description A drive has been smartfailed from the cluster The event message provides you with the chassis serial number node sled the drive slot number within the sled drive type and Logical Number LNUM of the drive Administrator action Replace the smartfailed drive with a new drive according to the instructions in the PowerScale Drive Replaceme...

Page 49: ...age provides you with the chassis serial number node sled the drive slot number within the sled drive type and Logical Number LNUM of the drive Administrator action Replace the drive with a compatible drive according to the instructions in the PowerScale Drive Replacement Guide for your platform 100010054 A drive was inserted in a bay that is disabled Description A drive was installed in a sled th...

Page 50: ...le Drive Replacement Guide for your platform 100010056 The write cache is enabled for a drive in a Generation 6 platform Write cache enabled drives are not compatible with Generation 6 nodes Description A write cache enabled drive was installed in a 6th Generation node and is not compatible with the node The event message provides you with the chassis serial number node sled the drive slot number ...

Page 51: ...s that the Large File feature is enabled and a node pool has passed the space in use in threshold System performance can be affected Administrator action If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 100010060 Error detected in PCI drive Location location Type media_type ...

Page 52: ...nd wasn t replaced before the sled service timeout limit was exceeded As a result the drive sled was smartfailed The event message provides you with the node sled and bay as well as the drive type and Logical Numbers LNUMs of the drives in the sled Administrator action Replace the drive sled 100020061 A drive sled was unexpectedly removed from a chassis All drives in the sled were suspended Descri...

Page 53: ...t was exceeded As a result the drive sled was smartfailed The event message provides you with the node and drive sled that was removed Administrator action Replace the drive sled 100030001 Drives have been marked as draining but usage is low Description Drives have been marked as draining but usage is low Please remove the draining flag from the drive Administrator action Remove the draining flag ...

Page 54: ...uired 200010003 One or more nodes are offline One or more nodes are offline due to one of the following conditions A node was intentionally shut down for maintenance A node lacks internal network connectivity Internal connectivity is how a node communicates with other nodes on the cluster A node cannot join the group Administrator action If the Cluster Status page in the OneFS web administration i...

Page 55: ... pool received the same upgrade and that the nodes are still equivalent 2 Attempt to provision the node into a node pool by running the following command isi_evaluate_provision_drive 3 If the issue persists contact Support Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 200010008 The identified ...

Page 56: ...s in the order listed If the issue resolves after a step there is no need to complete the subsequent steps 1 Verify the following connections All of the cables in the rack and in adjacent racks are connected securely to the node and neither the cable nor the connector is damaged The cable is rated for the appropriate Ethernet speed The switch port speed is set to the same or higher speed as the Et...

Page 57: ... instructions on how to gather cluster logs see Gathering cluster logs 200020003 Multiple internal network issues were detected Administrator action Perform the following steps in the order listed If the issue resolves after a step there is no need to complete the subsequent steps 1 If internal network issues are not ongoing and this issue is not a recurring problem attempt the following steps Mak...

Page 58: ...troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 200020005 One of the nodes in your cluster has lost network connectivity on one or both of its external interfaces Administrator action Determine whether the issue is related to the cable or the node Perform the following steps in the order listed If the issue resolves after a step there is no need to complet...

Page 59: ...a different node replace the cable If the issue persists after replacing the cable move the cable to another port on the switch Confirm that multiple clusters are not connected to the same IB switch This configuration is unsupported and can cause this issue Each cluster must be connected to a dedicated IB switch If the above steps do not resolve the issue gather logs and then contact Technical Sup...

Page 60: ...e the switch The event message provides you with the switch serial number and the network fabric supported by the switch Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 200020011 A 40 Gigabit Ethernet link is not operating at maximum throughput Administrator action Determine...

Page 61: ...ntical network configuration When you plug the cable into the other node leave the other end of the cable plugged into the same switch port If the issue persists after plugging the cable into a different node replace the cable If the issue persists after replacing the cable move the cable to another port on the switch If the issue persists after moving the cable to another port on the switch revie...

Page 62: ...operation at the expected speed 2 Validate that cables and modules used support the expected speed For details on cables and modules see FAQ Optics and cables article 000134129 200020020 One of the nodes in your cluster has lost external network connectivity Administrator action Determine whether the issue is related to the cable or the node Perform the following steps in the order listed If the i...

Page 63: ...rect the cabling issue The alert automatically ends when the cabling issue is corrected 200020022 The back end fabric is unable to contact the back end Dell master switch Description The management service is down and is critical for Leaf spine failover operation to function correctly Dell Switch operating system communications between OneFS and Dell Ethernet switches SNMP in Arista switches The e...

Page 64: ...nistrator action Correct wiring to ensure primary and secondary fabric are in the same configuration so their bandwidths are identical 200020025 Back end network non connectivity Description Back end network non connectivity detected No connectivity between nodes 215 and 252 on the Int a network Administrator action Obtain a full list of nodes with no connectivity between them by running the isi_c...

Page 65: ...ownload and install the latest node firmware and drive support packages To install the drive firmware download the latest Drive Support Package and run the isi_dsp_install command To install the node firmware download the latest Node Firmware Package and run the isi upgrade firmware assess fw pkg path to NFP command 300010001 The node is being rebooted for maintenance purposes Administrator action...

Page 66: ...lve the power supply issue If the node is receiving power contact Technical Support 3 If the node is on but did not rejoin the cluster attempt to establish remote access through a secure shell SSH session If the SSH session fails attempt to establish remote access through the serial console 4 If neither the SSH session nor the serial console is responsive press CTRL T in the SSH session or in the ...

Page 67: ... being denied Administrator action Examine SMB usage on the affected node particularly any active sessions or open files If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 400030001 A process failed to restart despite several attempts to start it Administrator action Gather lo...

Page 68: ...structions on how to gather cluster logs see Gathering cluster logs 400040004 The target cluster for a SyncIQ job cannot create a requested snapshot Description If the cluster is split this event might appear for a node in the minority group the group that has fewer than half of the nodes If the event is for a node on the minority group you can safely ignore and quiet the event This message might ...

Page 69: ... cluster are overwritten and are not preserved or replicated on the SyncIQ source cluster SyncIQ is not designed to support bidirectional synchronization of data This event occurs if files were modified on the target cluster and SyncIQ is overwriting those modified files Administrator action Make sure that your workflow does not require files that are written manually to a SyncIQ target be preserv...

Page 70: ...instructions on how to gather cluster logs see Gathering cluster logs 400040011 SyncIQ is attempting to sync to an incompatible target version Administrator action SyncIQ only supports syncing to the same or newer target version Consider upgrading the target cluster to the same version of OneFS that is on the source cluster 400040012 A SyncIQ configuration error occurred Description An error occur...

Page 71: ..._migr The output should display isi_migr_bandwidth isi_migr_pworker isi_migr_sworker isi_migr_sched If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 400040015 SyncIQ failed to take a snapshot for a policy Description This event occurs under the following conditions SyncIQ fa...

Page 72: ...troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 400040018 The SyncIQ policy failed to upgrade Description This event occurs whenever the upgrade sync run for a policy has failed while in progress or when the source record for the policy is missing Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instruc...

Page 73: ...l troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 400040021 A SyncIQ SnapRevert job resolved conflicts between WORM committed files Description If WORM committed file conflicts are resolved as part of a SnapRevert job OneFS will generate a report that provides details about the file conflicts This event will specify the replication policy associated with t...

Page 74: ...ing for the Cloudpools preparation of a stubbed LIN Administrator action The event will not resolve automatically If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 400040026 Maximum file name length support differs between SyncIQ source and target cluster Description When syn...

Page 75: ...rough the OneFS web administration interface or by running the following command isi event test create Administrator action This message is informational No action is required 400050004 This is a heartbeat event that confirms that the event system is healthy Description In order to confirm that the system is operating correctly test events are automatically sent every day one event from each node ...

Page 76: ...P server is unresponsive Administrator action 1 Attempt the following possible solutions in the order listed If a solution resolves the issue there is no need to perform the subsequent solutions Make sure the cluster can reach the ICAP server through the network connection If the network connection is working as expected determine if the ICAP server is responsive 2 Cancel the existing event If the...

Page 77: ...not connect to the anti virus service because the maximum number of available connections has been exceeded Anti virus servers can support a maximum of 20 connections Administrator action Add additional anti virus servers 400060104 All CEE CAVA servers are offline All anti virus servers on a node are currently reporting as offline Administrator action Verify that the node is connected to the anti ...

Page 78: ...r Administrator action This event is informational and does not require any action 400060108 The anti virus service found an infected file Description A file was found to be infected by the anti virus server Administrator action This event is informational and does not require any action 400060109 The anti virus access zone is missing Description The required AvVendor access zone is missing Anti v...

Page 79: ... that is installed on the Windows server is the wrong version Administrator action Update the CEE or CAVA software on the anti virus server to the correct version 400060112 The required SMB service is unavailable on a node Administrator action Check the SMB service on the node and resolve any issues 400060113 The CAVA Filter Driver is offline Description The CAVA Filter Driver on a node is not res...

Page 80: ...lete the license activation process For more information on how to obtain a signed license file from Software Licensing Central SLC refer to the Licensing section of the OneFS Web Administration Guide or OneFS CLI Administration Guide 400070007 The cluster is using software that is not licensed Description The capacity of the cluster was recently upgraded An updated license file is required Admini...

Page 81: ...ix of encrypting nodes and non encrypting nodes Description You are currently migrating from a non encrypted cluster to an encrypted cluster If data is written to the cluster while this event is active the new data might not be encrypted Administrator action Confirm that the event has cleared after Professional Services finishes the cluster migration 400090003 Secure Remote Support SRS is not conf...

Page 82: ... the OneFS Web Administration Guide or the OneFS CLI Administration Guide Confirm that your SRS gateway server is powered up and connected to the cluster s external network If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 400100001 The job state changed Administrator action ...

Page 83: ...y 400100007 A job engine event occurred The cluster is full and data can no longer be written Administrator action If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 400100008 A job engine event occurred A file write operation is stalled or writing very slowly to the cluster A...

Page 84: ...e failed devices and a FlexProtect operation needs to be run FlexProtect will start automatically in the case of drive failures however in the case of node failures FlexProtect requires user intervention in order to start Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 40011...

Page 85: ...pt to look up the DNS name for the specified host failed Administrator action 1 Make sure that the DNS server includes an entry for the reported host name and that non PowerScale hosts can resolve the host name 2 Check for misspellings and typographical errors in the cluster NFS configuration Where possible use a fully qualified domain name FQDN for every host in the NFS export rules If you are no...

Page 86: ...n how to gather cluster logs see Gathering cluster logs 400140002 NFS could not translate a 64 bit cookie to a 32 bit cookie Description OneFS readdir cannot translate 64 bit cookies to 32 bit for a directory Administrator action Reduce the number of entries in the directory that the readdir is targeting This event might be the result of a directory with a large number of entries If the event pers...

Page 87: ... action Confirm that the upgrade is making progress If you feel that the upgrade process stopped contact Dell EMC PowerScale Technical Support 400150004 A step in the OneFS upgrade process is taking longer than expected Administrator action Confirm that the upgrade is still making progress If you feel that the upgrade process has discontinued contact Dell EMC PowerScale Technical Support 400150005...

Page 88: ...tion By design the upgrade framework design prevents any hooks or commands from starting if there are unresponsive nodes Specifically if the Agent on any node does not reply to status commands from the Supervisor As long as this condition persists all upgrade process stops Administrator action If the event persists gather logs and then contact Technical Support for additional troubleshooting For i...

Page 89: ...lity DU Description There are node s or drive s in a degraded or down state that are preventing reboot Administrator action Identify which nodes or drives are down or degraded by running isi_group_info and either repair or smartfail the down or degraded nodes out of the cluster If temporary DU is permissible run isi upgrade unblock to allow nodes to reboot when ready 400150011 Upgrade Drain Alert ...

Page 90: ...cluster cannot reach an external Common Event Enabler CEE server or the CEE server is unresponsive Administrator action 1 Ping the CEE server If the ping operation fails confirm that network connectivity exists between the cluster and the CEE server If you can establish contact between the cluster and the CEE server attempt to ping the CEE server again to see if the issue has resolved If you canno...

Page 91: ... in Isilon OneFS How to replace or renew the SSL certificate used for the Isilon web administration interface article 000157711 2 Cancel the existing event If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 400170002 A periodic check against the store finds expired certificate...

Page 92: ... enabled is pending allocation of the index Administrator action Wait until the event clears The dedupe process succeeds or fails within a few minutes 400180003 Inline dedupe allocation not supported on node lnn occurrence occurrence Description The node must be an F810 is not permitted to enable inline dedupe Administrator action This is an informational event only and no action is required 40018...

Page 93: ...afely ignored Inline dedupe is enabled and fully effective but the index is fragmented The issue might be resolved by completing the following steps 1 Disable inline dedupe 2 Free up memory by running the isi_flush command 3 Enable inline dedupe 4 If the issue does not resolve restart the node 400190001 Invalid dedupe directory path Description SmartDedupe has been configured with an invalid path ...

Page 94: ...r Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 400210002 The encryption key manager for Cloudpools is unable to start on the indicated node Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gat...

Page 95: ...a KMIP server is about to expire Administrator action Renew or replace the expired certificate 400210008 A certificate for a KMIP server has expired Administrator action Renew or replace the expired certificate 400220000 PDM degraded too many operations Description Multiple Policy Domain Manager PDM operations are pending PDM operations run in the background These are completed by the DomainTag jo...

Page 96: ...ig line 30 Directive Subsystem is not allowed within a Match block Administrator action Correct the configuration changes as per the reported error 400240000 S3 Service failed to start Administrator action Check cluster status If the cluster is healthy but S3 is failing contact Dell EMC technical support 400240001 Identity query failed user 1000 to name status STATUS_ACCESS_DENIED Description S3 f...

Page 97: ...pen a Service Request SR 400240004 S3 key in SBT is invalid SBT may be broken Current Basekey a b c Description S3 key in SBT is invalid SBT may be broken Administrator action Contact Dell EMC technical support to open a Service Request SR 400240005 S3 key in SBT has maxed out SBT may be full for bucket 123456 Description S3 key in SBT has maxed out SBT may be full for bucket Administrator action ...

Page 98: ...ated Administrator action This event is informational no action is required 500010001 The SmartQuotas module has notified a user of a quota violation Description You can disable notifications for this event or modify the SmartQuotas rules For information about configuring SmartQuotas rules see the OneFS Web Administration Guide Administrator action This message is informational No action is requir...

Page 99: ... is corrupt or invalid Administrator action If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 500010005 The SmartQuotas module failed to generate a requested quota report Administrator action Review your SmartQuotas report settings and make sure that the settings are configur...

Page 100: ...alf of the nodes In this case the message persists until the cluster is healthy and you can safely ignore the error Administrator action 1 Determine whether a node is in the minority or majority group by running the following command from the node that is reporting the error sysctl efs gmp has_quorum If the command returns 0 the error occurred on the minority group The message might continue until...

Page 101: ...05 The amount of data stored on the cluster is approaching or has exceeded the snapshot reserve space Description Exceeding the snapshot reserve space does not result in a failure to write snapshots to the cluster The system can write snapshots to any available disk space and snapshots can exceed the snapshot reserve space However problems occur when the available space in the cluster is less than...

Page 102: ...through the OneFS web administration interface 2 If enabled disable the Network Time Protocol NTP You cannot synchronize time through both an Active Directory server and NTP on the same cluster If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 700010003 The Windows time serve...

Page 103: ... OneFS Web Administration Guide If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 700020002 The Windows GID map range is full Authentication might fail until the range is increased Description The group ID GID range for mapping Microsoft Active Directory groups has run out of...

Page 104: ...h the domain If the node successfully connects to the domain the event clears itself Administrator action Confirm that TCP port 389 is open on your network If the event does not clear itself within five minutes or if the event recurs perform the following steps on the node on which the issue occurred 1 Ping the authentication server If the ping operation fails confirm that network connectivity exi...

Page 105: ... the node successfully connects to the domain the event clears itself Administrator action If the event does not clear itself within five minutes or if it recurs perform the following steps on the node on which the issue occurred 1 Ping the authentication server If the ping operation fails confirm that network connectivity exists between the cluster and the authentication server If you can establi...

Page 106: ...nt does not clear itself within five minutes or if it recurs perform the following steps on the node on which the issue occurred 1 Ping the authentication server If the ping operation fails confirm that network connectivity exists between the cluster and the authentication server If you can establish contact between the cluster and the authentication server attempt to ping the authentication serve...

Page 107: ...ected a metadata referential integrity error that requires manual intervention to resolve Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 800010003 An Isilon Data Integrity IDI failure was detected Description The system cannot verify data integrity The system will attempt t...

Page 108: ...ual intervention is required Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 800010006 A node has reported that the number of available file descriptors is approaching the maximum limit Administrator action 1 Identify a list of the process types with the largest number of fi...

Page 109: ...his message is informational No action is required 800010010 A node was unable to verify the backup copy of its journal on its peer node Description There are two possible issues that might result in this event The local copy of a node s journal is not valid There was an error when a node tried to verify the local copy of its journal against the mirror copy of its journal on its peer node Administ...

Page 110: ...need to complete the subsequent steps If the password for the cloud account was changed recently confirm that the change is reflected in the local CloudPools file Confirm that the cloud account is not attempting to log in with an incorrect username or password Confirm that the cloud account s username or password has not been removed from the system If the event persists gather logs and then conta...

Page 111: ...f the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 1100000005 A Cloudpool capacity threshold was exceeded Descriptions By default this event first appears when the amount of data on the cloud provider reaches 70 The event will notify you when the following capacity thresholds a...

Page 112: ...d isi cloud accounts view id If the account is disabled enable the account by running the following command isi cloud accounts modify id enabled yes 1100000008 CloudPools could not verify a provider certificate Description The certificate for the specified cloud provider is not properly installed or is not valid Administrator action Contact the cloud provider to obtain a valid certificate 11000000...

Page 113: ... 900010010 900010011 900010012 900010013 900020001 900020002 900020003 900020004 900020005 900020006 900020007 900020008 900020009 900020010 900020011 900020012 900020013 900020014 900020015 900020016 900020017 900020018 900020019 900020020 900020021 900020022 900020023 900020024 900020025 900020026 900020027 900020028 900020029 900020030 900020031 900020032 5 Hardware events 113 ...

Page 114: ...6 900060017 900060018 900060019 900060020 900060021 900060022 900060023 900060024 900060025 900060026 900060027 900060028 900060029 900060030 900060031 900060032 900060033 900060034 900060035 900060036 900060037 900060038 900060039 900060040 900080001 900080002 900080003 900080004 900080005 900080006 900080007 900080008 900080009 900080010 900080011 900080012 900080013 114 Hardware events ...

Page 115: ...2 900080033 900080034 900080035 900080036 900080037 900100001 900100004 900100018 900100019 900100020 900100021 900100022 900100023 900100024 900100025 900100026 900100027 900100028 900100029 900100030 900100031 900100032 900110001 900110002 900110003 900110004 900110005 900120001 900120002 900120003 900120004 900120005 900130001 900130002 900130003 900130004 900130005 Hardware events 115 ...

Page 116: ...4 900160005 900160006 900160007 900160008 900160009 900160010 900160011 900160012 900160013 900160014 900160015 900160016 900160017 900160018 900160019 900160020 900160021 900160022 900160023 900160024 900160100 900160102 900160101 900170001 900170002 900180001 900180002 900180003 900180004 900180005 900180006 900180007 900180008 900180009 900180010 900180011 900180012 116 Hardware events ...

Page 117: ...0100002 930100003 930100004 930100005 930100006 940100001 940100002 Hardware events overview Hardware events provide information about hardware specific status such as voltage power supply and fan speed issues 900010001 There is an error on the node motherboard such as a faulty clock battery Administrator action Gather logs and then contact Technical Support for additional troubleshooting For inst...

Page 118: ...o gather cluster logs see Gathering cluster logs 900010004 A sensor has detected that the node chassis is open Description This event typically appears when maintenance is being performed on the inside of the node while the node is powered on Or this event might appear if one of the NVRAM battery trays were pulled out Administrator action 1 Make sure that the battery tray is properly inserted 2 If...

Page 119: ...t shutdown the node disconnect the power cables and then press the power button on the node to discharge any remaining stored power in the node It is not critical to complete these steps immediately but this event will continue to appear until the issue has been addressed While this event is active the node will not report correct values for the temperature fans or power supply health on the node ...

Page 120: ...ect or if you need assistance obtaining the correct drive type contact Technical Support 900010010 The node has a 812 3 4 chassis SKU Description This event is generated if the model number of the chassis is not properly updated after replacing a node chassis Administrator action Contact Dell EMC PowerScale Technical Support to update the model number of the node 900010011 The Baseboard Management...

Page 121: ...cted node through SSH or serial cable Shut down the node by running the following command shutdown p now Wait for the node to shut down and then disconnect both power supply cables Press the power button on the node to discharge any remaining stored power Reconnect the power cables and then start the node 3 HD400 only Re seat the front panel connector by checking that the ribbon cable is properly ...

Page 122: ...e for less than a minute or so and the event does not repeat this event might be a false alarm Administrator action Follow the instructions in Event notification Fan speed out of spec article 000083406 to determine if this event is a false alarm If this event is not a false alarm contact Technical Support 900020004 A power supply fan in the node might have failed Description If the fan speed tempo...

Page 123: ...e for less than a minute or so and the event does not repeat this event might be a false alarm Administrator action Follow the instructions in Event notification Fan speed out of spec article 000083406 to determine if this event is a false alarm If this event is not a false alarm contact Technical Support 900020007 A chassis fan in the node might have failed Description If the fan speed temporaril...

Page 124: ... of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900020010 A voltage component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathe...

Page 125: ...mponent is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900020015 A voltage component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster ...

Page 126: ...mponent is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900020020 A voltage component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster ...

Page 127: ...r logs 900020025 A voltage component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900020026 The internal or ambient temperature around a node has exceeded the allowable threshold for the CPU Description Ambient temperature is only measured by front...

Page 128: ...that the front panel is out of specification the temperature in your data center might need to be adjusted If a node is subjected to high temperatures for an extended period of time the CPU is throttled and the node goes into read only mode to help prevent potential data loss due to component failure If the node temperature reaches critical levels it is possible that the node will shut down entire...

Page 129: ...rted Check for high CPU and disk usage in the node High usage can contribute to high temperatures within the node If the steps above were unsuccessful in clearing this event the subsystem that monitors the health of the hardware such as the temperature and fan speeds might have encountered a problem This event can occur intermittently without harm to the system and you can safely quiet the event u...

Page 130: ...ps in the order listed If the issue resolves after a step there is no need to complete the subsequent steps HD400 only Make sure that the drive drawer is properly shut by sliding it out and re closing it firmly but carefully Review the temperature statistics for the affected sensor which are included in the event If the temperature is consistently elevated the problem is likely a high ambient temp...

Page 131: ...luster logs 900020032 The internal or ambient temperature around a node has exceeded the allowable thresholds for the chassis Description Ambient temperature is only measured by front panel sensors If you receive an event that indicates that the front panel is out of specification the temperature in your data center might need to be adjusted If a node is subjected to high temperatures for an exten...

Page 132: ...se power Locate the electrical outlet to which the problematic power supply is connected and then determine if the outlet is functioning properly by plugging the power cable into a different electrical outlet If the issue is not resolved by using a different electrical outlet move the power cable from the power supply that reports the failure to the power supply of a node that does not report a fa...

Page 133: ... and re closing it firmly but carefully Review the temperature statistics for the affected sensor which are included in the event If the temperature is consistently elevated the problem is likely a high ambient temperature in the data center Address any changes in the cluster environment such as air conditioning outages Verify that air flow within the rack and through the front and rear panel vent...

Page 134: ...t panel 5 Move the front panel from a functioning node to the affected node and see if the event clears 6 Install the front panel from the affected node on another node to determine if the problem is with the front panel or with the node If the problem follows the front panel contact Technical Support to request a new front panel If the above steps do not resolve the issue gather logs and then con...

Page 135: ...e for less than a minute or so and the event does not repeat this event might be a false alarm Administrator action Follow the instructions in Event notification Fan speed out of spec article 000083406 to determine if this event is a false alarm If this event is not a false alarm contact Technical Support 900060005 A chassis fan in the node might have failed Description If the fan speed temporaril...

Page 136: ... of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900060008 A voltage component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathe...

Page 137: ...mponent is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900060013 A voltage component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster ...

Page 138: ...mponent is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900060018 A voltage component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster ...

Page 139: ...gh ambient temperature in the data center Address any changes in the cluster environment such as air conditioning outages Verify that air flow within the rack and through the front and rear panel vents of the node is not obstructed in any way Make sure that the faceplate on the affected node is installed properly seated and undamaged In some cases removing and re seating the faceplate will resolve...

Page 140: ...thering cluster logs 900060023 The internal or ambient temperature around a node has exceeded the allowable threshold Description Ambient temperature is only measured by front panel sensors If you receive an event that indicates that the front panel is out of specification the temperature in your data center might need to be adjusted If a node is subjected to high temperatures for an extended peri...

Page 141: ...e that the faceplate on the affected node is installed properly seated and undamaged In some cases removing and re seating the faceplate will resolve this issue Run the isi_hw_status command Review the output to determine whether there is a slow or failed fan that was not otherwise reported Check for high CPU and disk usage in the node High usage can contribute to high temperatures within the node...

Page 142: ...ter might need to be adjusted If a node is subjected to high temperatures for an extended period of time the CPU is throttled and the node goes into read only mode to help prevent potential data loss due to component failure If the node temperature reaches critical levels it is possible that the node will shut down entirely Administrator action Perform the following steps in the order listed If th...

Page 143: ...d and then determine if the outlet is functioning properly by plugging the power cable into a different electrical outlet If the issue is not resolved by using a different electrical outlet move the power cable from the power supply that reports the failure to the power supply of a node that does not report a failure If the cable is the issue replace the cable 4 If the issue persists take one powe...

Page 144: ...g cluster logs 900060030 A voltage component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900060031 A voltage component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For i...

Page 145: ...ptimal range for less than a minute or so and the event does not repeat this event might be a false alarm Administrator action Follow the instructions in Event notification Fan speed out of spec article 000083406 to determine if this event is a false alarm If this event is not a false alarm contact Technical Support 900060036 A power supply fan in the node might have failed Description If the fan ...

Page 146: ...d and then determine if the outlet is functioning properly by plugging the power cable into a different electrical outlet If the issue is not resolved by using a different electrical outlet move the power cable from the power supply that reports the failure to the power supply of a node that does not report a failure If the cable is the issue replace the cable 4 If the issue persists take one powe...

Page 147: ...d and then determine if the outlet is functioning properly by plugging the power cable into a different electrical outlet If the issue is not resolved by using a different electrical outlet move the power cable from the power supply that reports the failure to the power supply of a node that does not report a failure If the cable is the issue replace the cable 4 If the issue persists take one powe...

Page 148: ...rted Check for high CPU and disk usage in the node High usage can contribute to high temperatures within the node If the steps above were unsuccessful in clearing this event the subsystem that monitors the health of the hardware such as the temperature and fan speeds might have encountered a problem This event can occur intermittently without harm to the system and you can safely quiet the event u...

Page 149: ...inistrator action 1 Cancel or quiet the event 2 If the event recurs shutdown and restart the node by completing the following steps Connect to the affected node through SSH or serial cable Shut down the node by running the following command shutdown p now Wait for the node to shut down and then disconnect both power supply cables Press the power button on the node to discharge any remaining stored...

Page 150: ...is event is a false alarm If this event is not a false alarm contact Technical Support 900080004 A power supply fan in the node might have failed Description If the fan speed temporarily falls out of optimal range for less than a minute or so and the event does not repeat this event might be a false alarm Administrator action Follow the instructions in Event notification Fan speed out of spec arti...

Page 151: ... this event is a false alarm If this event is not a false alarm contact Technical Support 900080007 A chassis fan in the node might have failed Description If the fan speed temporarily falls out of optimal range for less than a minute or so and the event does not repeat this event might be a false alarm Administrator action Follow the instructions in Event notification Fan speed out of spec articl...

Page 152: ...is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900080011 A voltage component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see...

Page 153: ...mponent is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900080016 A voltage component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster ...

Page 154: ...ponent is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900080021 A voltage component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster l...

Page 155: ...issue replace the cable 4 If the issue persists take one power supply out of a different working node and attach the power supply to the affected node CAUTION Do not switch power supplies in the same node as this will cause the node to lose power If the issue follows the power supply the power supply must be replaced 5 If multiple nodes report power supply issues it is likely that the issue is env...

Page 156: ...plies in the same node as this will cause the node to lose power If the issue follows the power supply the power supply must be replaced 5 If multiple nodes report power supply issues it is likely that the issue is environmental Check each of the following items to confirm the health of the power subsystem Power Distribution Unit PDU functionality and status of any circuit breakers in the power pa...

Page 157: ...If multiple nodes report power supply issues it is likely that the issue is environmental Check each of the following items to confirm the health of the power subsystem Power Distribution Unit PDU functionality and status of any circuit breakers in the power path Power quality such as voltage frequency values and stability Uninterruptible Power Supply UPS health 6 If the issue is not constant and ...

Page 158: ...and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900080026 The internal or ambient temperature around a node has exceeded the allowable threshold for the CPU Description Ambient temperature is only measured by front panel sensors If you receive an event that indicates that the front panel is out of specifica...

Page 159: ...in the data center Address any changes in the cluster environment such as air conditioning outages Verify that air flow within the rack and through the front and rear panel vents of the node is not obstructed in any way Make sure that the faceplate on the affected node is installed properly seated and undamaged In some cases removing and re seating the faceplate will resolve this issue Run the isi...

Page 160: ...tes that the front panel is out of specification the temperature in your data center might need to be adjusted If a node is subjected to high temperatures for an extended period of time the CPU is throttled and the node goes into read only mode to help prevent potential data loss due to component failure If the node temperature reaches critical levels it is possible that the node will shut down en...

Page 161: ... for high CPU and disk usage in the node High usage can contribute to high temperatures within the node If the steps above were unsuccessful in clearing this event the subsystem that monitors the health of the hardware such as the temperature and fan speeds might have encountered a problem This event can occur intermittently without harm to the system and you can safely quiet the event unless the ...

Page 162: ... down entirely Administrator action Perform the following steps in the order listed If the issue resolves after a step there is no need to complete the subsequent steps HD400 only Make sure that the drive drawer is properly shut by sliding it out and re closing it firmly but carefully Review the temperature statistics for the affected sensor which are included in the event If the temperature is co...

Page 163: ...power supply of a node that does not report a failure If the cable is the issue replace the cable 4 If the issue persists take one power supply out of a different working node and attach the power supply to the affected node CAUTION Do not switch power supplies in the same node as this will cause the node to lose power If the issue follows the power supply the power supply must be replaced 5 If mu...

Page 164: ...ont and rear panel vents of the node is not obstructed in any way Make sure that the faceplate on the affected node is installed properly seated and undamaged In some cases removing and re seating the faceplate will resolve this issue Run the isi_hw_status command Review the output to determine whether there is a slow or failed fan that was not otherwise reported Check for high CPU and disk usage ...

Page 165: ...of I O inactivity an Identify Controller command is issued to make sure that the NVRAM card is still healthy If the NVRAM card does not respond the card might have failed and the system will force the node to reboot Administrator action Reboot the node If the event clears and does not recur no other action is required If the event persists gather logs and then contact Technical Support for additio...

Page 166: ...equired If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900100022 The NVRAM card did not respond to an NVRAM command and the node has been set to read only Administrator action Reboot the node If the event clears and does not recur no other action is required If the event p...

Page 167: ... logs 900100026 The NVRAM card did not obtain necessary msi x resources Description The NVRAM card will continue to function in this state but performance might be affected Administrator action Reboot the node If the event clears and does not recur no other action is required If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how...

Page 168: ...gained persistence in the chassis chassis The node will reboot to re arm NVDIMM Description The node lost persistence but has recovered The node requires a reboot to re arm the NVDIMM Administrator action Wait for the node restart this should happen 60 seconds from when the event occurs or manually restart the node if the message continues 900100030 NVDIMM in the DIMM slot has failed in the chassi...

Page 169: ... Currently it is set to 100 seconds This is subject to change as per the monitoring subsystem configuration of this node Administrator action Check services if issues persist contact Technical Support 900110001 The internal or ambient temperature around a node has exceeded the allowable threshold for the CPU Description Ambient temperature is only measured by front panel sensors If you receive an ...

Page 170: ...er cluster logs see Gathering cluster logs 900110002 A sensor in the front panel of a node has exceeded the specified threshold Description This event can occur intermittently without harm to the system Administrator action 1 Cancel or quiet the event 2 If the event recurs shutdown and restart the node by completing the following steps Connect to the affected node through SSH or serial cable Shut ...

Page 171: ... which the problematic power supply is connected and then determine if the outlet is functioning properly by plugging the power cable into a different electrical outlet If the issue is not resolved by using a different electrical outlet move the power cable from the power supply that reports the failure to the power supply of a node that does not report a failure If the cable is the issue replace ...

Page 172: ...d and then determine if the outlet is functioning properly by plugging the power cable into a different electrical outlet If the issue is not resolved by using a different electrical outlet move the power cable from the power supply that reports the failure to the power supply of a node that does not report a failure If the cable is the issue replace the cable 4 If the issue persists take one powe...

Page 173: ...and undamaged In some cases removing and re seating the faceplate will resolve this issue Run the isi_hw_status command Review the output to determine whether there is a slow or failed fan that was not otherwise reported Check for high CPU and disk usage in the node High usage can contribute to high temperatures within the node If the steps above were unsuccessful in clearing this event the subsys...

Page 174: ...r supplies in a node has failed or lost power Description It is possible that a power cable was unplugged during recent maintenance or the circuit supplying power to the affected power supply has failed Administrator action Perform the following steps in the order listed If the issue resolves after a step there is no need to complete the subsequent steps 1 Confirm that both power cables are proper...

Page 175: ...wer supplies in a node has failed or lost power Description It is possible that a power cable was unplugged during recent maintenance or the circuit supplying power to the affected power supply has failed Administrator action Perform the following steps in the order listed If the issue resolves after a step there is no need to complete the subsequent steps 1 Confirm that both power cables are prop...

Page 176: ... the following steps in the order listed If the issue resolves after a step there is no need to complete the subsequent steps HD400 only Make sure that the drive drawer is properly shut by sliding it out and re closing it firmly but carefully Review the temperature statistics for the affected sensor which are included in the event If the temperature is consistently elevated the problem is likely a...

Page 177: ...he problem follows the front panel contact Technical Support to request a new front panel If the above steps do not resolve the issue gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900130003 The node is reporting less than the expected amount of physical memory Description This event typically ...

Page 178: ...age component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900130009 A voltage component is out of specification Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cl...

Page 179: ...node will shut down entirely Administrator action Perform the following steps in the order listed If the issue resolves after a step there is no need to complete the subsequent steps HD400 only Make sure that the drive drawer is properly shut by sliding it out and re closing it firmly but carefully Review the temperature statistics for the affected sensor which are included in the event If the tem...

Page 180: ...se power Locate the electrical outlet to which the problematic power supply is connected and then determine if the outlet is functioning properly by plugging the power cable into a different electrical outlet If the issue is not resolved by using a different electrical outlet move the power cable from the power supply that reports the failure to the power supply of a node that does not report a fa...

Page 181: ...d and then determine if the outlet is functioning properly by plugging the power cable into a different electrical outlet If the issue is not resolved by using a different electrical outlet move the power cable from the power supply that reports the failure to the power supply of a node that does not report a failure If the cable is the issue replace the cable 4 If the issue persists take one powe...

Page 182: ...quired If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900140004 The internal or ambient temperature around a node has exceeded the allowable threshold Description Ambient temperature is only measured by front panel sensors If you receive an event that indicates that the fr...

Page 183: ...steps do not resolve the issue gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900140005 There are multiple power supply issues that might cause this event to occur Administrator action 1 If the event message specifies an issue with the power supply temperature verify that the ambient temperatur...

Page 184: ...r additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900160003 A compute node has failed and might need to be replaced Description The event message provides you with the chassis and node slot where the failed node is located The compute node is not a customer replaceable part Administrator action Contact Dell EMC PowerScale Technical Support to de...

Page 185: ...e cluster environment such as air conditioning outages Verify that air flow within the rack and through the front and rear panel vents of the chassis is not obstructed in any way If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900160006 Fan fault detected Description A node...

Page 186: ...ult detected Description An internal M 2 drive has failed The M 2 drive must be replaced The M 2 drive is not a customer replaceable part The event message provides you with the chassis and node slot of the affected node Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900160...

Page 187: ... replaced The event message provides you with the chassis and node slot of the affected node Administrator action Gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 900160013 Non transparent bridge fault detected Description A non transparent bridge fault is an indicator that a compute node has fai...

Page 188: ...w to gather cluster logs see Gathering cluster logs 900160015 Drive interface board fault detected Description A drive interface board fault is an indicator that a compute node has failed The compute node must be replaced The compute node is not a customer replaceable part The event message provides you with the chassis and node slot of the affected node Administrator action Gather logs and then c...

Page 189: ...wer supply is connected then determine if the outlet is functioning properly by plugging the power cable into a different electrical outlet If the issue is not resolved by using a different electrical outlet move the power cable from the power supply that reports the failure to the power supply of a node that does not report a failure If the cable is the issue replace the cable CAUTION Do not move...

Page 190: ...ter logs see Gathering cluster logs 900160020 A hardware error was corrected Description This event can be an indicator of issues with a compute node The compute node might need to be replaced The compute node is not a customer replaceable part The event message provides you with the chassis and node slot of the affected node Administrator action Gather logs and then contact Technical Support for ...

Page 191: ...nal is in an unprotected state Description When a node is disconnected from its peer node the node journals are not mirrored and data is at risk The event message provides you with the chassis and node slot where the disconnected node is located Administrator action 1 Confirm that both nodes are cabled correctly and powered up 2 If the issue persists contact Dell EMC PowerScale Technical Support t...

Page 192: ...terface card NIC reset occurred in the specified node Description The NIC reset is an information only event In the rare instance that a compression NIC resets this event provides a tracking marker for Dell EMC PowerScale Technical Support Administrator action No action is required 900160101 A network interface card NIC is not operating correctly in the specified node Administrator action Troubles...

Page 193: ...nistrator action The event may clear up automatically However if the connectivity problem persists check the services and contact Dell EMC technical support 900180002 Failed to communicate with the Internal Dual SD Module IDSDM Description The Internal Dual SD Module IDSDM located in the chassis is not populated has failed or is experiencing connectivity failures Contact Dell EMC Support to diagno...

Page 194: ...arge is low and is below an acceptable threshold and that a vault might fail Description There is a low charge in the NVDIMM battery When the event is identified data loss is prevented by placing the node into read only mode Administrator action If the battery does not service or charge on its own replace the battery 900180007 NVDIMM battery charge has been low for the exceeded time threshold Desc...

Page 195: ...umber and location by running using any of the following methods Run the isi_hwmon b DIMMHealthMonitoring command Run si_hwmon log In the iDRAC UI view the unhealthy DIMM slots 900180009 A physical security sensor has detected that an intrusion error has occurred Description PowerEdge servers contain physical security sensors to detect a chassis that has been left open or tampered with Administrat...

Page 196: ...r sensor_name is unhealthy and requires maintenance Description There is a bad damaged or degraded temperature sensor Administrator action Replace the temperature sensor 900180013 A power supply is unhealthy and may require maintenance Description The power supply unit is not correctly seated The power cable may be disconnected or improperly connected The power supply unit has failed There is an i...

Page 197: ...y components that may require maintenance Sensors sensor_list Description The event is generic for groups of sensor types It is useful for voltage and amperage sensors Only one event of this type is created per node Only the most current list of specific sensors that are affected is listed to reduce the number of events per cluster Administrator action For the affected sensors see the Event and Er...

Page 198: ...f to rearm NVDIMM Description The node lost persistence but has recovered The node requires a reboot to rearm the NVDIMM Administrator action Wait for the node restart 60 s from when the event occurs or manually restart the OneFS node if the message continues 900180030 NVDIMM has failed Node transitions to read only mode until the NVDIMM has been replaced Description NVDIMM experiences similar wea...

Page 199: ...DRACServices command 910100001 A fan in the node might have failed Description If the fan speed temporarily falls out of optimal range for less than a minute or so and the event does not repeat this event might be a false alarm Administrator action Follow the instructions in Event notification Fan speed out of spec article 000083406 to determine if this event is a false alarm If this event is not ...

Page 200: ...d rear panel vents of the node is not obstructed in any way Make sure that the faceplate on the affected node is installed properly seated and undamaged In some cases removing and re seating the faceplate will resolve this issue Run the isi_hw_status command Review the output to determine whether there is a slow or failed fan that was not otherwise reported Check for high CPU and disk usage in the...

Page 201: ...ront panel from a functioning node to the affected node and see if the event clears 6 Install the front panel from the affected node on another node to determine if the problem is with the front panel or with the node If the problem follows the front panel contact Technical Support to request a new front panel If the above steps do not resolve the issue gather logs and then contact Technical Suppo...

Page 202: ...nd fan speeds might have encountered a problem This event can occur intermittently without harm to the system and you can safely quiet the event unless the issue persists If the above steps do not resolve the issue gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 920100001 There are multiple batt...

Page 203: ... power down warning Administrator action When multiple fans fail or a fan module is removed for more than two minutes the node will reboot and the drives will power down within five minutes to prevent the drives from overheating The drives will remain powered down until the failed fan modules are replaced Replace a fan module if the fan has failed or re insert a fan module if it has been pulled fo...

Page 204: ...in the data center Address any changes in the cluster environment such as air conditioning outages Verify that air flow within the rack and through the front and rear panel vents of the node is not obstructed in any way Make sure that the faceplate on the affected node is installed properly seated and undamaged In some cases removing and re seating the faceplate will resolve this issue Run the isi...

Page 205: ...ues that are outside expected specifications Description The event message provides you with the chassis and node slot of the affected node Administrator action Monitor your cluster for other events that might be related to this event If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering clus...

Page 206: ...th the chassis and node slot of the affected node Administrator action Monitor your cluster for other events that might be related to this event If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions on how to gather cluster logs see Gathering cluster logs 930100004 A sensor is reporting electrical values that are outside expected speci...

Page 207: ...ted specifications Description This event will tell you which sensor is reporting the unexpected values The event message provides you with the chassis and node slot of the affected node Administrator action Monitor your cluster for other events that might be related to this event If the event persists gather logs and then contact Technical Support for additional troubleshooting For instructions o...

Page 208: ... unsupported nodes devid s devids msg Description OneFS version is currently running on the specified nodes in this cluster Administrator action Contact Technical Support to obtain the supported software version for this hardware 208 Hardware events ...

Reviews: