background image

2000 Series 
Troubleshooting Guide

P/N 83-00004287-12

Revision A

May 2008

Summary of Contents for 2000 Series

Page 1: ...2000 Series Troubleshooting Guide P N 83 00004287 12 Revision A May 2008 ...

Page 2: ...nd registered trademarks are proprietary to their respective owners The material in this document is for information only and is subject to change without notice While reasonable efforts have been made in the preparation of this document to assure its accuracy changes in the product design can be made without reservation and without notification to its users ...

Page 3: ...sure ID Display 13 Drive Modules 14 Disk Drives 14 Controller Modules 15 Drive Expansion Module 15 Power and Cooling Modules 15 Power Supply Unit 16 Cooling Fans 16 Airflow 17 2 Fault Isolation Methodology 19 Gather Fault Information 19 Determine Where the Fault Is Occurring 19 Review the Event Logs 20 Isolate the Fault 20 ...

Page 4: ...roblems Using RAIDar to Access a Storage System 36 Determining Storage System Status and Verifying Faults 37 Stopping I O 38 Clearing Metadata From Leftover Disk Drives 39 Isolating Faulty Disk Drives 40 Identifying a Faulty Disk Drive 40 Reviewing Disk Drive Error Statistics 41 Reviewing the Event Logs 43 Reconstructing a Virtual Disk 43 Isolating Data Path Faults 45 Isolating Internal Data Path ...

Page 5: ... with Scheduling Tasks 60 Selecting Individual Events for Notification 61 Selecting or Clearing All Events for Notification 62 Correcting Enclosure IDs 63 Problems After Power On or Restart 63 5 Troubleshooting Using Event Logs 65 Event Severities 65 Viewing the Event Log in RAIDar 66 Viewing an Event Log Saved From RAIDar 68 Reviewing Event Logs 69 Saving Log Information to a File 70 Configuring ...

Page 6: ... 85 Replacing a Controller Module or Expansion Module 87 Moving a Set of Expansion Modules 89 Updating Firmware 90 Updating Firmware During Controller Replacement 90 Updating Firmware Using RAIDar 91 Identifying SFP Module Faults 92 Removing and Replacing an SFP Module 93 Removing an SFP Module 93 Installing an SFP Module 94 Identifying Cable Faults 95 Identifying Cable Faults on the Host Side 95 ...

Page 7: ...Identifying Virtual Disk Faults 110 Clearing Metadata From a Disk Drive 112 Identifying Power and Cooling Module Faults 112 Removing and Replacing a Power and Cooling Module 114 Removing a Power and Cooling Module 114 Installing a Power and Cooling Module 115 Replacing an Enclosure 116 A Troubleshooting Using the CLI 117 Viewing Command Help 118 clear cache 118 clear expander status 118 ping 119 r...

Page 8: ... parameters 122 show enclosure status 122 show events 123 show expander status 123 show frus 123 show protocols 123 show redundancy mode 124 trust 124 Problems Scheduling Tasks 125 Create the Task 125 Schedule the Task 125 Errors Associated with Scheduling Tasks 126 Missing Parameter Data Error 126 Index 127 ...

Page 9: ... Controller Enclosure 2330 iSCSI Controller Enclosure SAS Expansion Enclosure This book is written for system administrators and service personnel who are familiar with Fibre Channel FC Internet SCSI iSCSI and Serial Attached SCSI SAS configurations network administration and RAID technology Before You Read This Book Before you begin to follow procedures in this book you must have already installe...

Page 10: ...Title Part Number Site planning information R Evolution Storage System Site Planning Guide 83 00004283 Late breaking information not included in the documentation set R Evolution 2730 Release Notes R Evolution 2530 Release Notes R Evolution 2330 Release Notes 83 00004282 83 00004396 83 00005032 Installing and configuring hardware R Evolution 2730 Getting Started Guide R Evolution 2530 Getting Star...

Page 11: ...ystem Topics covered in this chapter include Architecture Overview on page 11 Enclosure Chassis and Midplane on page 12 Drive Modules on page 14 Controller Modules on page 15 Drive Expansion Module on page 15 Power and Cooling Modules on page 15 Architecture Overview The following figure shows how field replaceable units FRUs connect within a storage system enclosure Figure 1 1 R Evolution Storage...

Page 12: ...tail Note Do not remove a FRU until the replacement is on hand Removing a FRU without a replacement will disrupt the system airflow and cause an over temperature condition Enclosure Chassis and Midplane An enclosure s metal chassis is 2U in height The front of the enclosure has two rackmount flanges called ears The left ear has the enclosure ID display The right ear has enclosure status LEDs The c...

Page 13: ...ve enclosures are attached to a host A drive enclosure s EID can be zero or nonzero Each drive enclosure in a storage system must have a unique EID EIDs are persistent so will not change during simple reconfigurations EIDs can be used to correlate physical enclosures with logical views of the storage system provided by system interfaces When installing a system with drive enclosures attached the e...

Page 14: ...er off Once power is applied the RAID controllers use the metadata held on each disk to locate each member of a virtual disk Disk Drives Each RAID controller has single port access from the local SAS expander to internal and drive enclosure drives Alternate path dual port access to all internal drives is accomplished through the expander inter controller wide lane connection Dual port access assum...

Page 15: ...s in a controller module processor or a bus fault occurs that is related to the controller module the entire controller module FRU is replaced Drive Expansion Module Expansion module architecture is a simplified version of controller module architecture Like a controller module an expansion module has an Expander Controller and uses the SAS protocol Each module has a SAS In port and a SAS Out port...

Page 16: ...ther through a calibrated cavity Should a power and cooling module be turned off or unplugged the fans inside the module continue to operate at normal capacity This is accomplished by powering each fan from a power bus on the midplane The fans variable speed is controlled by the controller modules through an I2C interface The fans also provide tachometer speed information through the I2C interface...

Page 17: ...ovide low noise and high mass flow rates Airflow is from front to back Each drive slot draws ambient air in at the front of the drive sending air over the drive surfaces and then through tuned apertures in the chassis midplane Note that the airflow washes over the top and bottom surface of the disk drive at high mass flow and velocity flow rates so both sides of the drive are used for cooling The ...

Page 18: ...18 R Evolution 2000 Series Troubleshooting Guide May 2008 ...

Page 19: ...s the fault related to an internal data path or an external data path Is the fault related to a hardware component such as a drive module controller module or power and cooling module By isolating the fault to one of the components within the storage system you will be able to determine the necessary action more rapidly Determine Where the Fault Is Occurring Once you have an understanding of the r...

Page 20: ...ault but also to search for events that might have caused the fault to occur For example a host could lose connectivity to a virtual disk if a user changes channel settings without taking the storage resources assigned to it into consideration In addition the type of fault can help you isolate the problem to hardware or software For more information about event logs see Troubleshooting Using Event...

Page 21: ...wing topics LED Names and Locations on page 21 Using LEDs to Check System Status on page 23 LED Names and Locations This section identifies the LEDs in each FRU Figure 3 1 Enclosure and Drive Module LEDs Drive status LEDs top to bottom Enclosure ID Enclosure status LEDs top to bottom Unit Locator Fault Service Required FRU OK Temperature Fault OK to Remove Power Activity Fault Drive modules are nu...

Page 22: ...thernet link status Ethernet activity port status 10 100 BASE T STATUS ACTIVITY DIRTY CLEAN CACHE CLI MUI LINK ACT iSCSI Port 0 iSCSI Port 1 LINK ACT Expansion Cache status Unit Locator OK to Remove FRU OK Fault Service Required port status Host link activity Host link status Host activity Ethernet link status Ethernet activity Expansion Cache status Unit Locator OK to Remove FRU OK Fault Service ...

Page 23: ...his functionality can help you determine the cause of a fault in a FRU The following topics describe what to do when an LED indicates a fault condition For descriptions of all LED statuses see the getting started guide for your enclosure model Using Enclosure Status LEDs on page 24 Using Drive Module LEDs on page 24 Using Controller Module Host Port LEDs on page 25 Using the Controller Module Expa...

Page 24: ... green steady or blinking If the Power Activity Fault LED is off the drive is not powered on If the drive should be powered on check that it is fully inserted and latched in place and that the enclosure is powered on If the Power Activity Fault LED is steady yellow either The drive has experienced a fault or has failed The associated virtual disk is critical and no spare is available This LED is l...

Page 25: ... in a host data path component If you cannot locate a specific fault or cannot access the event logs use the procedure for your storage system model to isolate the fault Isolating a Host Side Connection Fault on a Fibre Channel Storage System on page 25 Isolating a Host Side Connection Fault on an iSCSI Storage System on page 29 Isolating a Host Side Connection Fault on a Fibre Channel Storage Sys...

Page 26: ...atus LED on Yes You have isolated the fault to the SFP Replace the SFP No Proceed to the next step 6 Re insert the original SFP and swap the cable with a known good one Is the host link status LED on Yes You have isolated the fault to the cable Replace the cable No Proceed to the next step 7 Replace the HBA with a known good HBA or move the host side cable and SFP to a known good HBA Is the host l...

Page 27: ...D If there is activity halt all applications that access the storage system 3 Reseat the SAS cable Is the host link status LED on Yes Monitor the status to ensure that there is no intermittent error present If the fault occurs again clean the connections to ensure that a dirty connector is not interfering with the data path No Proceed to the next step 4 Move the SAS cable to a port with a known go...

Page 28: ...e HBA No It is likely that the controller module needs to be replaced 6 Move the cable back to its original port Is the host link status LED on No The controller module s port has failed Replace the controller module Yes Monitor the connection for a period of time It may be an intermittent problem which can occur with damaged cables and HBAs ...

Page 29: ...ccurs again clean the connections to ensure that a dirty connector is not interfering with the data path No Proceed to the next step 4 Move the cable to a port with a known good link status This step isolates the problem to the external data path host cable and host side devices or to the controller module port Is the host link status LED on Yes You now know that the host cable and host side devic...

Page 30: ...osure the expansion port status LED is green If the connected port s expansion port LED is off the link is down If the connected port s LED is off the link down In RAIDar review the event logs for indicators of a specific fault If you cannot locate a specific fault or cannot access the event logs use the following procedure to isolate the fault This procedure requires scheduled downtime Note Do no...

Page 31: ... enclosure Is the expansion port status LED on Yes You have isolated the problem to the drive enclosure s port Replace the expansion module No Proceed to Step 7 7 Replace the cable with a known good cable ensuring the cable is attached to the original ports used by the previous cable Is the host link status LED on Yes Replace the original cable The fault has been isolated No It is likely that the ...

Page 32: ...rom write cache to Compact Flash memory When cache flush is complete the cache transitions into self refresh mode If the LED is blinking slowly a cache flush is in progress In self refresh mode if primary power is restored before the backup power is depleted 3 30 minutes depending on various factors the system boots finds data preserved in cache and writes it to disk This means the system can be o...

Page 33: ...the expansion module is connected to a controller module or a host the SAS In port status LED is green If the SAS Out port is connected to another expansion module the SAS Out port status LED is also green The other LEDs are off If a connected port s status LED is off the link is down In RAIDar review the event logs for indicators of a specific fault in a host data path component If the FRU OK LED...

Page 34: ...34 R Evolution 2000 Series Troubleshooting Guide May 2008 ...

Page 35: ...ftover Disk Drives on page 39 Isolating Faulty Disk Drives on page 40 Isolating Data Path Faults on page 45 Changing PHY Fault Isolation Settings on page 54 Using Recovery Utilities on page 56 Problems Scheduling Tasks on page 59 Selecting Individual Events for Notification on page 61 Selecting or Clearing All Events for Notification on page 62 Correcting Enclosure IDs on page 63 Problems After Po...

Page 36: ...the CLI The other person s changes might not display in RAIDar until you refresh the RAIDar page If you are using Internet Explorer clear the following option Tools Internet Options Accessibility Ignore Colors Specified On Webpages Prevent RAIDar pages from being cached by disabling web page caching in your browser Menu options are not available User configuration affects the RAIDar menu For examp...

Page 37: ...ummary 2 Check the status icon at the upper left corner of each panel A green icon indicates that components associated with that panel are operating normally A red icon with an exclamation point indicates that at least one component associated with that panel has a fault and is operating in a degraded state or is offline Figure 4 1 Status Summary Page with a Fault Identified by Status Icons 3 Rev...

Page 38: ...lts ensure you have a current full backup As an additional data protection precaution stop all I O to the affected virtual disks When on site you can verify that there is no I O activity by briefly monitoring the system LEDs however when accessing the storage system remotely this is not possible To check the I O status of a remote system use the Monitor Statistics Overall Rate Stats page The Overa...

Page 39: ...many errors RAIDar reports that the leftover drive is part of virtual disk Leftover and shows the drive as follows in enclosure view Before you can use the drive a different virtual disk or as a spare you must clear the metadata To clear metadata from drives 1 Select Manage Utilities Disk Drive Utilities Clear Metadata An enclosure view is displayed in which only Leftover and Available drives are ...

Page 40: ...g Storage System Status and Verifying Faults on page 37 You can also navigate to the Monitor Status Show Notification page and look for any notifications pertaining to a disk drive fault When you have confirmed a drive fault record the drive s enclosure number and slot number To identify the physical location of a faulty drive 1 Select Manage Utilities Disk Drive Utilities Locate Disk Drive 2 Sele...

Page 41: ...ption SMART Event Count The number of SMART Self Monitoring Analysis and Reporting Technology events that the drive recorded These events are often used by the vendor to determine the root cause of a drive failure Some SMART events may indicate imminent electromechanical failure I O Timeout Count The number of times the drive accepted an I O request but did not complete it in the required amount o...

Page 42: ... intermittent errors you might have to monitor the storage system for more than 24 hours 3 To view the error statistics select the suspected drive and click Show Disk Drive Error Statistics 4 Review the Disk Drive Error Statistics panel for drive errors The Disk Drive Error Statistics panel enables you to review errors from each of the two ports Non Media Errors The number of soft recoverable erro...

Page 43: ...matically uses the spares to reconstruct the virtual disk Virtual disk reconstruction does not require I O to be quiesced so the virtual disk can continue to be used while the Reconstruct utility runs A properly sized spare is one whose capacity is equal to or greater than the smallest drive in the virtual disk If no properly sized spares are available reconstruction does not start automatically T...

Page 44: ... starts to run using the spare but its progress remains at 0 until a second properly sized spare is available The second available spare can be an existing global spare another existing spare for the virtual disk or a replacement drive that you designate as a spare or that is automatically taken when dynamic sparing is enabled During reconstruction though the critical virtual disk icon is displaye...

Page 45: ...eview event logs Replace the faulty component Isolating Internal Data Path Faults A Physical Layer Interface PHY is an interface in a device used to connect to other devices The term refers to the physical layer of the Open Systems Interconnect OSI basic reference model The physical layer defines all of the electrical and physical specifications for a device In a SAS architecture each physical poi...

Page 46: ...HY is usually not disabled Instead the errors are accumulated and reported On the other hand bad cables connecting enclosures damaged controller connectors and other physical damage can cause continual errors which the fault isolation firmware can often trace to a single problematic PHY The fault isolation firmware recognizes the large number and rapid rate of these errors and disables this PHY wi...

Page 47: ...r a controller enclosure and increments from 1 for attached drive enclosures World Wide Name Enclosure node World Wide Name Model Enclosure model number Rack Position Assigned rack number and position of the enclosure within the rack or 0 0 if not set Position 1 is the top and 16 is the bottom Firmware Version Version of the EC which performs SES functions The Phy Isolation Details panel shows the...

Page 48: ...ical IDs are 0 11 for disk PHYs and 0 3 for inter expander egress and ingress PHYs Details Pause the cursor over or click the information icon to view a popup with more information If you click the icon the information remains shown until the cursor passes over a similar icon Status The same status value shown in the panel s Status field Physical Phy ID Identifies a PHY s physical location in the ...

Page 49: ...nt Specifies the number of invalid doublewords that have been received by the PHY not including those received during Link Reset sequences Reset Error Count Specifies the number of times the expander performed a reset Phy Disabled Specifies whether the PHY is enabled True or disabled False Fault Reason A coded value that explains why the EC isolated the PHY If the PHY is active this value is 0x0 F...

Page 50: ...ly connected If they are not tighten the connectors 2 Reset the affected controller or power cycle the enclosure 3 If the problem persists replace the affected FRU or enclosure 4 Periodically examine the Expander Status page to see if the fault isolation firmware disables the same PHY again If it does a Replace the appropriate cable b Reset the affected controller or power cycle the enclosure Phy ...

Page 51: ...cable 3 To target the cause of the link failure view the host port details by clicking on a port in the graphical view and then reviewing the details listed below it The data displayed includes Host Port Status Details Selected controller and port number SFP Detect SFP is present or not present An SFP is used to connect the FC host port through an FC cable to another FC device Receive Signal Signa...

Page 52: ...or more of the following conditions A faulty HBA or NIC in the host A faulty Fibre Channel cable A faulty port in the host interface module A disconnected cable 3 To target the cause of the link failure view the host port details by clicking on a port in the graphical view and then reviewing the details listed below it The data displayed includes iSCSI Port Status Details Selected controller and p...

Page 53: ...the host A faulty SAS cable A faulty port in the host interface module A disconnected cable 3 To target the cause of the link failure view the host port details by clicking on a port in the graphical view and then reviewing the details listed below it The data displayed includes Topology Port connection type Speed Actual link speed in Gbit per second per PHY lane Number of Active Lanes The number ...

Page 54: ...ntroller options 3 Click Reset Host Channel Changing PHY Fault Isolation Settings PHY lanes are the physical signal paths used for communication between the SAS expander in each controller module and the drive modules in a system The Expander Controller in each controller module automatically monitors PHY error fault rates and isolates disables PHYs that experience too many errors The Expander Iso...

Page 55: ...Y its button changes to Enable and its Status value changes to DISABLED When you enable a PHY its button changes to Disable and its status value changes to OK or another status Disabling or Enabling PHY Isolation You can change an expander s PHY Isolation setting to enable or disable fault monitoring and isolation for all PHYs in that expander If Disable is shown the setting is enabled if Enable i...

Page 56: ...lugged Sometimes not all drives in the virtual disk power up Check that all enclosures have rebooted after a power failure If these problems are found and then fixed the virtual disk recovers and no data is lost The quarantined virtual disk s drives are write locked and the virtual disk is not available to hosts until the virtual disk is removed from quarantine The system waits indefinitely for th...

Page 57: ...nction can force an offline virtual disk to be critical or fault tolerant or a critical virtual disk to be fault tolerant You might need to do this when A drive was removed or was marked as failed in a virtual disk due to circumstances you have corrected such as accidentally removing the wrong disk In this case one or more drives in a virtual disk can start up more slowly or might have been powere...

Page 58: ...t the virtual disk and click Trust This Vdisk 6 Back up the data from all the volumes residing on this virtual disk and audit it to make sure that it is intact 7 Select Manage Virtual Disk Config Verify Virtual Disk While the verify utility is running any new data written to any of the volumes on the virtual disk is written in a parity consistent way Note If the virtual disk does not come back onl...

Page 59: ...he period of recurrence To debug schedule parameters 1 Will the task run if you only specify a start time Schedule your task with only the start time Remove all other constraints Review the schedule table Look at the Next Time to run column Does it show what you expect If the task does not run check how you created the task 2 Add one more specification For example if you want the task to run every...

Page 60: ...P server is available the system time and date is obtained from the NTP server To manually change the date or time see the reference guide Deleting Tasks Before you can delete a task you must delete any schedules that run the task Errors Associated with Scheduling Tasks The following table describes error messages associated with scheduling tasks Table 4 2 Errors Associated with Scheduling Tasks E...

Page 61: ...ling events than individual event selection If the critical event category is selected all critical events cause a notification regardless of the individual critical event selection You can select individual events to fine tune notification either instead of or in addition to selecting event categories For example you can select the critical event category to be notified of all critical events and...

Page 62: ...ication method see the reference guide 5 Click the change events button Selecting or Clearing All Events for Notification You can select or clear all individual events for any or all of the notification types Selecting all individual events is useful if you want to select many events but not all set all the events on this page then go to pages for individual events and clear events you don t want ...

Page 63: ...es Disk Drive Utilities Rescan In the Rescan For Devices panel click Rescan Problems After Power On or Restart After powering on the storage system or restarting the MC or SC the processors take about 45 seconds to boot up and the system takes an additional minute or more to become fully functional and able to process commands from RAIDar or the CLI The time to become fully functional depends on m...

Page 64: ...64 R Evolution 2000 Series Troubleshooting Guide May 2008 ...

Page 65: ... Information to a File on page 70 Configuring the Debug Log on page 71 Event Severities The storage system generates events having three severity levels Informational A problem occurred that the system corrected or a system change has been made These events are purely informational no action required Warning Something related to the system or to a virtual disk has a problem Correct the problem as ...

Page 66: ...g Voltage failure this leads to a shutdown which is also logged The event log stores the most recent events with a time stamp next to them with one second granularity Note If you are having a problem with the system or a virtual disk check the event log before calling technical support Event messages might enable you to resolve the problem You can save the event log to a file see Saving Log Inform...

Page 67: ...arning Events Shows only critical and warning events for both controllers Controller A Events Shows events logged by controller A Controller B Events Shows events logged by controller B Button Description All Controller Events Shows all events This is the default Controller Critical Warning Events Shows only critical and warning events Field Description Severity Level Critical Warning or Info info...

Page 68: ... most recent event is at the bottom of a section In the event log sections the following information appears Event SN Event Serial Number The prefix A or B indicates which controller logged the event This corresponds to the Event Serial Number column in RAIDar Date Time Year month day and time when the event occurred Code Event code that assists service personnel when diagnosing problems This corr...

Page 69: ...the other controller if necessary 3 Review the events that occurred before and after the primary event During this review you are looking for any events that might indicate the cause of the critical warning event You are also looking for events that resulted from the critical warning event known as secondary events 4 Review the events following the primary and secondary events You are looking for ...

Page 70: ... save logs operation at a time or to perform a firmware update operation while performing a save logs operation Doing so will display a buffer busy error To save log information to a file 1 Select Manage Utilities Debug Utilities Save Logs To File 2 Type contact information and comments to include in the log information file Contact information provides the support representatives who are reviewin...

Page 71: ...ebug log with information that engineering can use to diagnose the system Note The debug log only collects data after you configure it It will not contain information about any problems that occurred before you configure it To configure the debug log 1 Select Manage Utilities Debug Utilities Debug Log Setup The Debug Log Setup page is displayed 2 Select the debug log setup you want Standard Used f...

Page 72: ...in the log This is the default If no events are selected this option is not displayed 3 Click Change Debug Logging Setup 4 If instructed by service personnel click Advanced Debug Logging Setup Options and select one or more additional types of events Under normal conditions none of these options should be selected because they have a slight impact on read write performance ...

Page 73: ...ter include Resolving Voltage and Temperature Warnings on page 73 Sensor Locations on page 74 Resolving Voltage and Temperature Warnings To resolve voltage and temperature warnings 1 Check that all of the fans are working by making sure each power and cooling module s DC Voltage Fan Fault Service Required LED is off or by using the RAIDar Status Summary page see Determining Storage System Status a...

Page 74: ... redundant power and cooling modules with load sharing capabilities The power supply sensors described in the following table monitor the voltage temperature and fans in each power and cooling module If the power supply sensors report a voltage that is under or over the threshold check the input voltage Cooling Fan Sensors Each power and cooling module includes two fans The normal range for fan sp...

Page 75: ...odule has one temperature sensor When a temperature fault is reported it must be remedied as quickly as possible to avoid system damage This can be done by warming or cooling the installation location Table 6 2 Cooling Fan Sensor Descriptions Description Location Event Fault ID LED Condition Fan 0 Power and cooling module 0 4000 RPM Fan 1 Power and cooling module 0 4000 RPM Fan 2 Power and cooling...

Page 76: ...ore information see RAIDar help or the reference guide Onboard Temperature 3 Capacitor Temperature 0 70 C None None None CM Temperature 5 50 C 5 C 50 C 0 C 55 C None Table 6 4 Power and Cooling Module Temperature Sensors Description Normal Operating Range Power Supply 1 Temperature power and cooling module 0 0 80 C Power Supply 2 Temperature power and cooling module 0 0 80 C Table 6 3 Controller M...

Page 77: ...re that an enclosure s power supply voltage is within normal ranges There are three voltage sensors per power and cooling module Table 6 5 Voltage Sensor Descriptions Sensor Event Fault ID LED Condition Power Supply 1 Voltage 12V 11 00V 13 00V Power Supply 1 Voltage 5V 4 00V 6 00V Power Supply 1 Voltage 3 3V 3 00V 3 80V ...

Page 78: ...78 R Evolution 2000 Series Troubleshooting Guide May 2008 ...

Page 79: ...on Module Faults on page 80 Removing and Replacing a Controller or Expansion Module on page 82 Updating Firmware on page 90 Identifying SFP Module Faults on page 92 Removing and Replacing an SFP Module on page 93 Identifying Cable Faults on page 95 Identifying Drive Module Faults on page 96 Removing and Replacing a Drive Module on page 104 Identifying Virtual Disk Faults on page 110 Identifying Po...

Page 80: ...ng Handle a FRU only by its edges and avoid touching the circuitry Do not slide a FRU over any surface Limit body movement which builds up static electricity during FRU installation Identifying Controller or Expansion Module Faults The controller and expansion modules contain subcomponents that require the replacement of the entire FRU should they fail Each controller and expansion module contains...

Page 81: ...dules controller module B will not boot An SDRAM memory error is reported Replace the controller module where the error occurred Controller Failure Event codes 84 and 74 The controller might need to have its firmware upgraded or be replaced Check the specific error code to determine the corrective action to take Controller voltage fault Check the power and cooling module and the input voltage Cont...

Page 82: ... a problem with the module The internal clock battery fails Caution In a dual controller configuration both controllers must have the same cache size If the new controller has a different cache size controller A will boot and controller B will not boot To view the cache size select Monitor Advanced Settings Controller Versions Saving Configuration Settings Before replacing a controller module save...

Page 83: ...en or save the file click Save 5 If prompted to specify the file location and name do so using a config extension The default file name is saved_config config Note If you are using Firefox and have a download directory set the file is automatically saved there In a dual controller configuration the storage system s partner Firmware Upgrade option is enabled by default so when you upgrade a control...

Page 84: ...orage system and host applications do not have access to its volumes If you want the system to remain available before shutting down one controller verify that the other controller is active To shut down a controller module 1 Select Manage Restart System Shut Down Restart 2 In the Shut Down panel select a controller option 3 Click Shut Down A warning might appear that data access redundancy will b...

Page 85: ... 10 seconds remove the controller from the slot and repeat the process Note Although the illustrations provided in the following steps show a controller module the instructions also apply to an expansion module To remove a controller module or expansion module 1 Follow all static electricity precautions as described in Static Electricity Precautions on page 80 2 If removing the controller module u...

Page 86: ...it Locator LED front is blinking and within it the module whose Fault Service Required LED is yellow and Unit Locator LED back is white 6 Disconnect any cables connected to the controller If both SAS cables to an expansion module have to be disconnected shut down both controllers Note In a single controller configuration you must shut down the controller to prevent the virtual disks from going off...

Page 87: ...oller module or expansion module into an enclosure that is powered on Caution When replacing a controller module ensure that less than 10 seconds elapse between inserting the module into a slot and fully latching it in place Failing to do so might cause the controller to fail If it is not latched within 10 seconds remove the module from the slot and repeat the process ...

Page 88: ...4 Press the latches upward to engage the controller 2 turn the thumbscrews finger tight 5 Reconnect the cables Note In a dual controller configuration if the firmware versions differ between the two controllers Partner Firmware Upgrade brings the older firmware to the later firmware level The FRU OK LED illuminates green when the module completes initializing and is online If the enclosure s Unit ...

Page 89: ...ule is reinserted into the enclosure the controller s date and time are automatically updated to match the date and time of the partner controller In a single controller configuration you must set the clock manually To set the date and time in RAIDar select Manage General Config Set Date Time Persistent IP Address The IP address for each controller is stored in a SEEPROM on the midplane The IP add...

Page 90: ...ntroller replacement or by using RAIDar Caution Do not power off the storage system during a firmware upgrade Doing so might cause irreparable damage to the controllers Updating Firmware During Controller Replacement When a replacement controller is sent from the factory it might have a more recent version of firmware installed than the surviving controller in your system By default when you inser...

Page 91: ...our firmware using RAIDar perform the following steps 1 Ensure that the software package file is saved to a location on your network that the storage system can access 2 Select Manage Update Software Controller Software The Load Software panel is displayed which describes the update process and lists your current software versions 3 Click Browse and select the software package file 4 Click Load So...

Page 92: ...to RAIDar Identifying SFP Module Faults The FC Controller enclosure uses small form factor pluggable SFP transceivers to attach the enclosure to Fibre Channel data hosts Note Remove any SFP that is not connected to another device As the storage system monitors itself it will generate several events for each unconnected SFP as if there were an error Identifying SFP faults is difficult because they ...

Page 93: ...p on fiber optic cables Do not bend the fiber optic cables tighter than a 2 inch radius Caution To prevent possible loss of access to data be sure to remove the correct cable and SFP Removing an SFP Module To remove an SFP module perform the following steps Note If removing more than one cable make sure to label them before removing 1 Disconnect the fiber optic interface cable by pushing up on the...

Page 94: ...d gently pull on it to remove the SFP from the controller Installing an SFP Module To install an SFP module perform the following steps 1 If the SFP has a plug remove it and slide the SFP into the port until it locks into place 2 Flip the actuator down and connect the fiber optic interface cable into the duplex jack at the end of the SFP ...

Page 95: ...s LED and perform the troubleshooting procedure described in Using Controller Module Host Port LEDs on page 25 Identifying Cable Faults on the Drive Enclosure Side To identify a cable fault on the drive enclosure side perform the troubleshooting procedure described in Using Expansion Module LEDs on page 33 Disconnecting and Reconnecting SAS Cables The storage system supports disconnecting and reco...

Page 96: ...error is due to a faulty disk drive or faulty disk drive channel Identify what action the controller has taken to protect the virtual disk after the drive fault occurred that is rebuilding to a hot spare Know how to identify disk drives in the enclosure Understand the proper procedure for replacing a faulty drive module Understanding Disk Related Errors The event log includes errors reported by th...

Page 97: ...s the descriptions for the standard SCSI sense codes ASC and sense code qualifiers ASCQ all in hexadecimal See the SCSI Primary Commands 2 SPC 2 Specification for a complete list of ASC and ASCQ descriptions Table 7 2 Standard SCSI Sense Key Descriptions Sense Key Description 0h No sense 1h Recovered error 2h Not ready 3h Medium error 4h Hardware error 5h Illegal request 6h Unit attention 7h Data ...

Page 98: ... faulty cables problems with particular drive slots or even problems with the drive s dongle a small printed circuit board attached to the drive carrier of each drive Each of these events may result in a warning or critical notification in RAIDar and the event log Table 7 3 Common ASC and ASCQ Descriptions ASC ASCQ Descriptions 0C 02 Write error auto reallocation failed 0C 03 Write error recommend...

Page 99: ...ev Busy Target reported busy status Dn Ov Run Data overrun or underrun has been detected IOTimeout Array aborted an I O request to this target because it timed out Link Down Link down while communication in progress LIP I O request was aborted because of a channel reset No Respon No response from target Port Fail Disk channel hardware failure This may be the result of bad cabling PrtcolError Array...

Page 100: ...ry If no continue to Step 5 5 The fault may be caused by a bad disk drive slot on the midplane Confirm your findings by powering off the storage system moving an operating disk drive into the suspected slot and re applying power Note Step 5 requires that you schedule down time for the system 6 Does this drive fail when placed in the suspected slot Yes replace the enclosure You have located the fau...

Page 101: ... a status icon the name RAID level size number of disk drives and number of volumes and utility status if any For each virtual disk where a utility is running a Utility Running For Virtual Disk panel specifies its status Note To stop the Initialize or Verify utility go to the Abort A Vdisk Utility page To stop background scrub of virtual disks go to the General Config System Configuration page You...

Page 102: ...ss 3 Click Select Type And Continue Disk drives of the type you selected are listed and the following information is displayed for each disk drive Device WWN The disk drive s node WWN Address Port 0 The channel and SCSI ID of the drive as accessed through controller A Address Port 1 The channel and SCSI ID of the drive as accessed through controller B Size The size of the disk drive in Gbyte Manuf...

Page 103: ...rmware update completes successfully This operation can take many minutes or hours to complete During the update the following operations are blocked so that they do not interfere with the update Updating controller software buffer interference Saving logs to a file buffer interference Displaying disk drive read cache status SCSI interference When all selected drives have been updated a message in...

Page 104: ...odule offline before you remove it and then after you have replaced it to bring the new drive module online See the documentation that accompanies your disk management software or volume management software for more information Replacing a Drive Module When the Virtual Disk Is Rebuilding When a drive module fails or is removed the system rebuilds the virtual disk by restoring any data that was on ...

Page 105: ...cing a drive module perform the following steps to ensure that you have identified the correct drive module for removal Caution Failure to identify the correct drive module might result in data loss from removing the wrong drive 1 When a disk drive fault occurs the failed disk drive s lower LED is solid yellow indicating that it must be replaced locate the yellow LED at the front of the drive modu...

Page 106: ...e that is it is not harmful to the storage system to keep a fault drive inserted until you have a replacement drive If you do have an air management module it is installed using the same procedure for removing a drive module as described below Caution If you remove a drive module and do not replace it within two minutes you alter the air flow inside the enclosure which could cause overheating of t...

Page 107: ...locking mechanism 3 Orient the drive module with the LEDs to the left Slide the drive module into the drive slot as far as it will go 4 Rotate the drive ejector handle toward the left until the release clicks closed to firmly seat the drive module in the enclosure s internal connector If the controller enclosure is powered on the green Power Activity Fault LED illuminates indicating that the disk ...

Page 108: ... vdisk spare and start the rebuild Select Manage Virtual Disk Config Global Spare Menu Note Reconstructing a RAID 6 virtual disk to a fault tolerant state requires two properly sized spares to be available Critical The vdisk is online however some drives are down and the vdisk is not fault tolerant This is a degraded state and only applies to RAID 6 Use RAIDar to assign either a global spare or a ...

Page 109: ...ive 1 Power up the enclosures and associated data host in the following order a Drive enclosures first b Controller enclosure next Quarantined The vdisk is offline and has been quarantined because some drives are missing Wait for the missing drive to come online If it doesn t create another vdisk and perform a restore from the latest backed up copy Select Manage Virtual Disk Config Create A Vdisk ...

Page 110: ...can be ordered If you must remove a drive module and cannot immediately replace it you must leave the faulty drive module in place or insert an air management module to maintain the optimum airflow inside the chassis The blank is installed using the same procedure as Installing a Drive Module on page 107 Identifying Virtual Disk Faults Obvious virtual disk problems involve the failure of a member ...

Page 111: ...re all the same size within the virtual disk The virtual disk is limited by the smallest sized disk Volumes in the virtual disk are not visible to the host Verify that the volumes are mapped to the host using RAIDar Manage Volume Management Volume Mapping Map Hosts to Volume Virtual Disk Degraded Event codes 58 and 1 or event codes 8 and 1 Replace the failed disk drive and add the replaced drive a...

Page 112: ...primary components fans and a power supply When ether of these components fails RAIDar provides notification the faults are recorded in the event log and the power and cooling module s status LED changes from green to yellow Alternatively you can use the CLI to poll for events see the CLI reference guide Note When a power supply fails the fans of the module continue to operate because they draw po...

Page 113: ...replace a module leave the old module in place until you have the replacement or use a blank cover to close the slot Leaving a slot open negatively affects the airflow and might cause the unit to overhead Make sure that the controller modules are properly seated in their slots and that their latches are locked Power and cooling module status is listed as failed or you receive a voltage event notif...

Page 114: ... the old module The enclosure might overheat if you take more than two minutes to replace the power and cooling module Removing a Power and Cooling Module To remove a power and cooling module from an enclosure perform the following steps 1 Follow all static electricity precautions as described in Static Electricity Precautions on page 80 2 Turn the power switch off and disconnect the power cable 3...

Page 115: ... Module To install a power and cooling module perform the following steps 1 Slide the module into the slot as far as it will go 2 Press the latch upward to engage the module turn the thumbscrews finger tight 3 Reconnect the power cable and turn the power switch on Thumbscrew Latch ...

Page 116: ...osure To make a fully functional enclosure you must insert the following parts from the replaced enclosure Drive modules and air management modules Two power and cooling modules One or two controller modules for a controller enclosure One or two expansion modules for a drive enclosure To install the individual modules use the replacement instructions provided in this guide To configure the enclosu...

Page 117: ...rescan on page 119 reset host channel link on page 119 restart on page 119 restore defaults on page 120 set debug log parameters on page 120 set expander fault isolation on page 121 set expander phy on page 121 set led on page 121 set protocols on page 121 show debug log on page 122 show debug log parameters on page 122 show enclosure status on page 122 show events on page 123 show expander status...

Page 118: ...orphaned data for volumes that no longer exist This command can be used with a dual controller configuration only For details about using clear cache see the CLI reference guide clear expander status Note This command should only be used by service technicians or with the advice of a service technician Clears the counters and status for SAS Expander Controller lanes Counters and status can be rese...

Page 119: ...controllers on specified channels This command is for use with an FC system using FC AL loop topology For details about using reset host channel link see the CLI reference guide restart Restarts the RAID controller or the Management Controller in either or both controller modules If you restart a RAID controller it attempts to shut down with a proper failover sequence which includes stopping all I...

Page 120: ...art the RAID controllers and Management Controllers for the changes to take effect After restarting the controllers hosts might not be able to access volumes until you re map them Caution This command changes how the system operates and might require some reconfiguration to restore host access to volumes For details about using restore defaults see the CLI reference guide set debug log parameters ...

Page 121: ...cate devices For a drive module the Power Activity Fault LED will blink yellow For an enclosure the Unit Locator LED on the chassis ear and on each controller module will blink white For details about using set led see the CLI reference guide set protocols Enables or disables one or more of the following service and security protocols http for standard access to RAIDar https for secure access to R...

Page 122: ...e show debug log parameters Note This command should only be used by service technicians or with the advice of a service technician Shows which debug message types are enabled on or disabled off for inclusion in the Storage Controller debug log For details about using show debug log parameters see the CLI reference guide show enclosure status Shows the status of system enclosures and their compone...

Page 123: ...hnicians or with the advice of a service technician Shows diagnostic information relating to SAS Expander Controller physical channels known as PHY lanes For each enclosure this command shows status information for PHYs in I O module A and then I O module B For details about using show expander status see the CLI reference guide show frus Shows information for all field replaceable units FRUs in t...

Page 124: ...up more slowly or were powered on after the rest of the disks in the virtual disk This causes the date and time stamps to differ which the system interprets as a problem with the late disks In this case the virtual disk functions normally after being trusted A virtual disk is offline because a drive is failing you have no data backup and you want to try to recover the data from the virtual disk In...

Page 125: ...d lose data There is no unmount command in the CLI The host system must perform this task Schedule the Task If your task does not run at the times you specified check the schedule specifications It is possible to create conflicting specifications Start time is the first time the task will run If you use the Between option the starting date time must be in the Between range The year must be four di...

Page 126: ...a virtual disk named A or a without specifying the assigned to parameter To use a name that the CLI could interpret as an optional parameter you must specify that parameter before the name parameter Table 7 8 Errors Associated with Scheduling Tasks Error Message Solution Task Already Exists Select a different name for the task Unknown Task Type The task type is misspelled Valid task types are Take...

Page 127: ...ing 124 cooling element fan sensor descriptions 75 critical events 65 selecting to monitor 61 critical state virtual disk preventing 56 D data paths isolating faults 45 debug log 71 setting up 71 viewing 122 debug log parameters setting 120 viewing 122 debug utilities debug log setup 71 default configuration settings restoring 120 dequarantining virtual disks 57 diagnostic manage level only functi...

Page 128: ... individual events to monitor 61 events configuring notification 61 types 65 events showing 123 expander fault isolation enabling or disabling 121 expander PHYs enabling or disabling 121 expander status and error counters clearing 118 expander status showing 123 expansion module architecture 15 enclosure ID does not update 89 identifying faults 80 installing 87 moving 89 removing 85 replacing 82 F...

Page 129: ... detail panel 46 fault isolation 45 fencing 46 internal data path faults 46 rescan disks 46 physical layer interface See PHY 45 pinging a remote host 119 power and cooling module architecture 16 identifying faults 112 installing 115 removing 114 replacing 114 115 power and cooling modules voltage sensor descriptions 77 power on problems after 63 protocols service and security enabling or disabling...

Page 130: ...5 spin up retries displaying 41 static electricity precautions 80 status determining overall system health 37 disk 41 status summary 37 Storage Controller restarting 119 system architecture overview 11 T task scheduling 59 temperature warnings resolving 73 trust virtual disk caution 57 trusting an offline virtual disk 124 V virtual disk reconstructing 43 trusting an offline 124 virtual disks dequa...

Reviews: