background image

CHALLENGE

®

 RAID

Owner’s Guide

Document Number 007-2532-004

Summary of Contents for CHALLENGE RAID

Page 1: ...CHALLENGE RAID Owner s Guide Document Number 007 2532 004 ...

Page 2: ...in whole or in part without the prior written permission of Silicon Graphics Inc RESTRICTED RIGHTS LEGEND Use duplication or disclosure of the technical data contained in this document by the Government is subject to restrictions as set forth in subdivision c 1 ii of the Rights in Technical Data and Computer Software clause at DFARS 52 227 7013 and or in similar or successor clauses in the FAR or ...

Page 3: ...rmance Disk Striping 12 Enhanced Performance Storage System Caching 12 Data Reconstruction and Rebuilding After Disk Module Failure 13 RAID Levels 14 RAID 0 Group Nonredundant Array 14 RAID 1 Mirrored Pair 15 RAID 1_0 Group Mirrored RAID 0 Group 16 RAID 5 Individual Access Array 18 RAID Hot Spare 20 Using the CHALLENGE RAID Command Line Interface 21 2 Storage System Configurations 23 Basic Configu...

Page 4: ...9 Binding Disks Into RAID Units 49 Getting Disk Group LUN Information 53 Changing LUN Parameters 56 Dual Interfaces Load Balancing and Device Names 57 5 Maintaining Disk Modules 59 Identifying and Verifying a Failed Disk Module 59 Setting Up the Workplace for Replacing or Installing Disk Modules 62 Replacing a Disk Module 63 Ordering Replacement Disk Modules 63 Unbinding the Disk 64 Removing a Fai...

Page 5: ...7 Caching 85 Setting Cache Parameters 86 Viewing Cache Statistics 87 Upgrading CHALLENGE RAID to Support Caching 90 Changing Unit Caching Parameters 91 A Technical Specifications 93 B The raid5 Command Line Interface 97 bind 99 chglun 102 clearlog 104 clearstats 104 firmware 104 getagent 105 getcache 106 getcontrol 109 getcrus 109 getdisk 110 getlog 114 getlun 116 setcache 120 unbind 121 Index 123...

Page 6: ......

Page 7: ...0 RAID 1 Mirrored Pair Hardware Mirrored Pair 15 Figure 1 11 Distribution of User Data in a RAID 1_0 Group 17 Figure 1 12 Distribution of User and Parity Data in a RAID 5 Group 19 Figure 1 13 Hot Spare Example 21 Figure 2 1 Dual Interface Dual Processor Configuration Example 27 Figure 2 2 Split Bus Configuration Example 30 Figure 2 3 Dual Bus Dual Initiator Configuration Example 33 Figure 3 1 CHAL...

Page 8: ...e Disk Module Rail 69 Figure 5 8 Engaging the Disk Module Guide 70 Figure 5 9 Inserting the Replacement Disk Module 70 Figure 5 10 Marking the Label for Disk Module A0 74 Figure 5 11 Disk Drive Locations 75 Figure 5 12 Engaging the Disk Module Rail 75 Figure 5 13 Engaging the Disk Module Guide 76 Figure 5 14 Inserting a Disk Module 76 Figure 6 1 Unlocking the Fan Module 83 Figure 6 2 Opening the F...

Page 9: ...put of raid5 getcrus 43 Table 4 1 Output of raid5 getlun 54 Table 5 1 Ordering Replacement Disk Modules 63 Table 5 2 Ordering Add On Disk Module Sets 72 Table 6 1 Field Replaceable Units 79 Table 7 1 Output of raid5 getcache 88 Table A 1 CHALLENGE RAID Deskside Chassis Specifications 93 Table A 2 CHALLENGE RAID Rack Specifications 94 Table B 1 raid5 Parameters 97 Table B 2 Output of raid5 getagent...

Page 10: ......

Page 11: ... in a CHALLENGE RAID rack can be connected to one or more SCSI buses on CHALLENGE servers separately or in combination RAID levels 0 1 1_0 0 1 and 5 are supported as well as disks configured as hot spares In addition a basic CHALLENGE RAID storage system provides storage system caching Structure of This Guide This guide contains the following chapters Chapter 1 Features of the CHALLENGE RAID Stora...

Page 12: ...mmarizes technical information for the CHALLENGE RAID deskside storage system Appendix B The raid5 Command Line Interface lists and explains all parameters of the raid5 command An index completes this guide Conventions In command syntax descriptions and examples square brackets surrounding an argument indicate an optional argument Variable parameters are in italics Replace these variables with the...

Page 13: ...harmful interference in which case the user will be required to correct the interference at personal expense VDE 0871 6 78 This equipment has been tested to and is in compliance with the Level A limits per VDE 0871 European Union Statement This device complies with the European Directives listed on the Declaration of Conformity which is included with each product The CE mark insignia displayed on ...

Page 14: ...que n emet pas de bruits radioelectriques depassant les limites applicables aux appareils numeriques de Classe A prescrites dans le Reglement sur le Brouillage Radioelectrique etabli par le Ministere des Communications du Canada Japanese Compliance Statement ...

Page 15: ...niversity of California Berkeley in their 1987 paper A Case for Redundant Arrays of Inexpensive Disks RAID University of California Berkeley Report No UCB CSD 87 391 That paper defines various levels of RAID This chapter introduces the CHALLENGE RAID disk array storage system It explains CHALLENGE RAID storage system components data availability and performance RAID levels the RAID hot spare using...

Page 16: ...em Deskside Version Front View Note In Figure 1 1 the front cover is removed for clarity Figure 1 2 is an external view of the CHALLENGE RAID rack with the maximum of four chassis assemblies installed Each chassis assembly in a CHALLENGE RAID rack corresponds to one deskside CHALLENGE RAID chassis ...

Page 17: ...3 Figure 1 2 CHALLENGE RAID Rack ...

Page 18: ...all or replace the SCSI 2 interface You can replace the disk modules by following instructions in Chapter 5 in this guide A CHALLENGE server can support multiple CHALLENGE RAID storage systems The various storage system configurations along with their availability and performance features are explained in Chapter 2 in this guide The number of CHALLENGE RAID deskside storage systems or rack chassis...

Page 19: ... server HIO add on cards mezzanine cards on the POWER Channel 2 I O controller one or two storage control processors SPs 5 to 20 disk modules in groups of five one fan module two or three power supplies VSCs or voltage semi regulated converters one battery backup unit BBU for storage system caching optional The CHALLENGE server holds the SCSI 2 interface s the CHALLENGE RAID storage system chassis...

Page 20: ...e system with one SP Figure 1 4 CHALLENGE RAID Server With One SP For higher performance a CHALLENGE RAID storage system can support an additional SP The second SP provides a second path to the storage system so both SPs can connect to the same host or two different hosts as diagrammed in Figure 1 5 and Figure 1 6 With two SPs the storage system can support storage system caching whereby each SP t...

Page 21: ...s Figure 1 6 SPs Connected to Different CHALLENGE Chassis CHALLENGE server CHALLENGE RAID SCSI 2 bus SCSI 2 bus SCSI 2 interface SP A SCSI 2 interface SP B CHALLENGE RAID First CHALLENGE server Second CHALLENGE server SCSI 2 interface SP A SP B SCSI 2 interface SCSI 2 bus SCSI 2 bus ...

Page 22: ...ID based on its position in the storage system The disk modules are inserted in the following order modules A0 B0 C0 D0 and E0 array 0 modules A1 B1 C1 D1 and E1 array 1 modules A2 B2 C2 D2 and E2 array 2 modules A3 B3 C3 D3 and E3 array 3 Figure 1 7 diagrams this placement Individual disk modules have disk position labels attached Figure 1 7 Disk Module Locations Chassis Front View A0 B0 C0 D0 A2...

Page 23: ... A0 A1 A2 and A3 in that order Figure 1 8 diagrams this configuration Figure 1 8 SCSI 2 Bus and Internal Buses Front View Deskside Chassis assembly in rack A2 B2 C2 D2 E2 A0 B0 C0 D0 E0 A1 B1 C1 D1 E1 A3 B3 C3 D3 E3 Internal bus A Internal bus B Internal bus C Internal bus D Internal bus E A0 B0 C0 D0 E0 Internal bus A Internal bus B Internal bus C Internal bus D Internal bus E A1 B1 C1 D1 E1 A2 B...

Page 24: ...hts while the disk module is powered up and ready for use Busy light green lights while the drive is in use for example during formatting or user I O operations Fault light amber lights when the module is shut down by the SP because the module failed also lights after you replace the drive while the replacement drive spins up to speed A label attached to the carrier s side shows the disk module s ...

Page 25: ...HALLENGE RAID RAID 0 RAID 1 RAID 1_0 and RAID 5 Because the CHALLENGE RAID storage system has five internal SCSI 2 buses RAID 5 provides redundancy for up to five groups of disk modules A RAID 5 group maintains parity data that lets the disk group survive a disk module failure without losing data In addition the group can survive a single SCSI 2 internal bus failure if each disk module in the grou...

Page 26: ...s available for CHALLENGE RAID storage systems that have two SPs each with at least 8 MB of memory a battery backup unit and disk modules in slots A0 through E0 With storage system caching enabled each SP temporarily stores requested information in its memory Caching can save time in two ways For a read request if data is sought after the request is already in the read cache the storage system avo...

Page 27: ...gured bound as a hot spare it is available as a replacement for a failed disk module See RAID Hot Spare later in this chapter When a disk module in any RAID level except RAID 0 fails the SP automatically writes to the hot spare and rebuilds the group using the information stored on the surviving disks Performance is degraded while the SP rebuilds the data and parity on the new module However the s...

Page 28: ...AID storage system is also not recommended particularly disk modules in slots A0 B0 C0 and A3 which contain the licensed internal code and those in slots D0 and E0 which serve with A0 B0 and C0 as the storage system cache vault RAID 0 Group Nonredundant Array Three to sixteen disk modules can be bound as a RAID 0 group A RAID 0 group uses striping see Enhanced Performance Disk Striping earlier in ...

Page 29: ...e fault tolerance automatic mirroring no commands are required to initiate it physical separation of images faster write operation than RAID 5 With a RAID 1 mirrored pair the storage system writes the same data to both disk modules in the mirror as shown in Figure 1 10 Figure 1 10 RAID 1 Mirrored Pair Hardware Mirrored Pair To achieve the maximum fault tolerance configure the mirror with each disk...

Page 30: ...s the distribution of user data with the default stripe element size of 128 sectors 65 536 bytes in a six module RAID 1_0 group Notice that the disk block addresses in the stripe proceed sequentially from the first mirrored disk modules to the second mirrored modules to the third mirrored image disk modules then from the first mirrored disk modules and so on A RAID 1_0 group can survive the failur...

Page 31: ...2 1919 1408 1535 1024 1151 640 767 Third module of secondary image 1536 1663 First module of primary image Blocks 0 127 384 511 768 895 1152 1279 1664 1791 128 255 1280 1407 512 639 896 1023 Second module of secondary image 0 127 384 511 768 895 1152 1279 First module of secondary image 1792 1919 256 383 1408 1535 1024 1151 640 767 Third module of primary image 128 255 1280 1407 896 1023 512 639 S...

Page 32: ...ernal buses A B C and so on With RAID 5 technology the hardware writes parity information to each module in the array If a module fails the SP can reconstruct all user data from the user data and parity information on the other disk modules After you replace a failed disk module the SP automatically rebuilds the disk array using the information stored on the remaining modules The rebuilt disk arra...

Page 33: ...ty data for those sectors 2 Recalculate the parity data 3 Write the new user and parity data Fifth module Parity Fourth module 384 511 896 1023 1408 1535 Third module 256 383 768 895 1792 1919 1280 1407 2048 2175 1920 2047 2304 2431 2432 2559 128 255 640 767 1152 1279 2176 2303 1664 1791 Parity Parity Parity 1024 1151 512 639 1536 1663 Parity First module Second module Blocks 0 127 Stripe Stripe e...

Page 34: ...modules in the original slots and the SP automatically frees the hot spare to serve as a hot spare again Note The SP finishes rebuilding the disk module before it begins copying data even if you replace the failed disk during the rebuild process A hot spare is most useful when you need the highest data availability It eliminates the time and effort needed for someone to notice that a module has fa...

Page 35: ...iled Figure 1 13 Hot Spare Example Using the CHALLENGE RAID Command Line Interface Run the command line interface usr raid5 raid5 in an IRIX window on your CHALLENGE server to bind group or unbind physical disks into a RAID 0 RAID 1 RAID 1_0 or RAID 5 unit or hot spare change parameters on a currently bound group logical unit number or LUN get names of devices controlled by the SP change or get in...

Page 36: ...information about a group of disks perform housekeeping operations such as clearing the error log or updating firmware Note Although the directory and command are raid5 the command is valid for all RAID levels The relevant parameters of the command line interface are explained for each task in the rest of this guide Appendix B is a complete guide to the command line interface ...

Page 37: ...number of storage control processors and SCSI 2 interfaces Disk configuration within the storage system Before you can plan your disk configuration you must understand storage system configuration Several storage system configurations are available for CHALLENGE RAID storage systems Table 2 1 lists the hardware components making up each configuration and summarizes the features of each This chapte...

Page 38: ...host and its applications can continue after any disk module fails The host using a failed SCSI 2 interface or SP cannot continue after failure but the other host can If one host SCSI 2 adapter or SP fails the other host can take over the failed host s disks with system operator intervention Dual bus dual initiator 2 4 2 per server 2 1 per server 2 Provides highest availability and best storage sy...

Page 39: ...ontinue running System operator replaces module Storage control processor No Storage system fails System operator replaces SP and restarts operating system Fan module Yes Applications continue running System operator replaces module Power supply Yes If redundant power supply module is present applications continue running otherwise storage system fails Service provider replaces power supply SCSI 2...

Page 40: ...ansfer disk ownership Table 2 3 lists these features Table 2 3 Error Recovery Dual Interface Dual Processor Configuration Failing Component Continue After Failure Recovery Disk module Yes Applications continue running System operator replaces module Storage control processor Yes I O operations fail to disk units owned by a failing SP System operator can transfer control of the failed SP s disk uni...

Page 41: ... convenient the Silicon Graphics SSE or other authorized service provider can replace the interface and the system operator can transfer control of disk units to the replacement SP SCSI 2 cable Yes I O operations fail to storage system disk units owned by the SP attached to the failed cable System operator can transfer control of these disk units to the other SP shut down the host power off and on...

Page 42: ... for sites requiring high availability because either host can continue after failure of any disk module within a disk array and a host can take over a failed host s disks A host cannot continue after a SCSI 2 interface or an SP fails unless you manually transfer disk ownership Table 2 4 lists the error recovery features for this configuration Table 2 4 Error Recovery Split Bus Configuration Faili...

Page 43: ...ned by the SP attached to the failed interface System operator can transfer control of the failed SP s disk units to the SP on the interface in the other host shut down the other host power off and on the storage system and reboot the other host Silicon Graphics SSE or other authorized service provider replaces the interface SCSI 2 cable Yes I O operations fail to storage system disk units owned b...

Page 44: ...l has full access to its own data The storage control processor that binds a disk module is the default owner of the disk module The route through the SP that owns a disk module is the primary route to the disk module The route through the other SP is the secondary route to the disk module In a dual interface system either CHALLENGE server can use any of the disk modules in the storage system but ...

Page 45: ...onent in the primary route fails Table 2 5 lists the error recovery features of the dual bus dual initiator configuration Caution Because both hosts can access the same disk modules simultaneously the danger exists that one host can overwrite data stored by the other This configuration requires specific hardware and software such as a database lock manager to protect the integrity of the stored da...

Page 46: ...em operator can transfer control of these disks to the SP connected to the working adapter shut down both hosts power off and on the storage system and reboot both hosts When convenient the Silicon Graphics SSE or other authorized service provider can replace the interface and the system operator can transfer control of disk units to replacement SP SCSI 2 cable Yes I O operations fail to storage s...

Page 47: ...appropriate filesystem configuration and failsafe software is installed Host 1 RAID 5 group for fast access CHALLENGE 1 CHALLENGE RAID CHALLENGE 2 SCSI 2 bus SCSI 2 bus database Host 2 Accounts on 6 disks bound as RAID 1_0 Host 1 Hot spare Host 2 Mirrored pair for user directories moderate access time Host 1 Mirrored pair for user directories moderate access time SCSI 2 interface 1 SCSI 2 interfac...

Page 48: ......

Page 49: ...id5 with its parameters in an IRIX shell on CHALLENGE to get names of devices controlled by the storage control processor SP display status information on disk modules disk module groups LUNs SPs and other system components and display the storage processor log in which error messages are stored Note Although the directory and command are raid5 the command is valid for all RAID levels Other chapte...

Page 50: ...er light indicates a fault See Figure 3 1 Figure 3 1 CHALLENGE RAID Indicator Lights The amber service light comes on when an SP is reseated the CHALLENGE RAID is powered off and on the battery backup unit has not finished recharging if battery backup unit is present in the system If the service light is lit look for a disk module fault light that is lit Then you can either explore status further ...

Page 51: ...d the CHALLENGE RAID storage system must be running The synopsis of the raid5 command is raid5 vp d device parameter optional_arguments In this syntax variables mean v Enables verbose return p Parses the raid5 command without calling the API If the string does not parse correctly an error message is printed to stderr otherwise there is no output d device Target RAID device Use raid5 getagent for a...

Page 52: ...3 1 Output of raid5 getagent Entry Meaning Name ASCII string found in the agent configuration file which assigns a name to the node being accessed see Node description below Desc ASCII string found in the agent configuration file which describes the node being accessed see Node description below Node The dev scsi entry which the agent uses as a path to the actual SCSI device This value must be ent...

Page 53: ...otal Writes 1304 Prct Busy 25 Prct Idle 75 System Date 5 5 1995 Day of the week Friday System Time 12 43 54 Getting Information About Disks For information about all bound disks in the system use this command in an IRIX shell usr raid5 raid5 d device getdisk For information on a particular disk use usr raid5 raid5 d device getdisk diskposition SP Memory Amount of DRAM present on the SP Serial No 1...

Page 54: ...n about disk A2 raid5 d scsi4d210 getdisk a2 A sample output of this command follows A0 Vendor Id SEAGATE A0 Product Id ST15150N A0 Lun 0 A0 State Bound and Not Assigned A0 Hot Spare NO A0 Prct Rebuilt 100 A0 Prct Bound 100 A0 Serial Number 032306 A0 Capacity 0x000f42a8 A0 Private 0x00009000 A0 Bind Signature 0x1c4eb2bc A0 Hard Read Errors 0 A0 B0 C0 D0 A2 B2 C2 D2 E2 E0 A1 B1 C1 D1 A3 B3 C3 D3 E3...

Page 55: ...en powered off Off disk is physically present in the chassis but is not spinning Powering Up disk is spinning and diagnostics are being run on it Unbound disk is healthy but is not part of a LUN Bound and Not Assigned disk is healthy part of a LUN but not being used by this SP Rebuilding disk is being rebuilt Enabling disk is healthy bound and being used by this SP Binding disk is in the process o...

Page 56: ...nd battery backup unit information is shown under BBU Bind Signature Unique value assigned to each disk in a logical unit at bind time Hard Read Errors Number of hard errors encountered on reads for this disk Hard Write Errors Number of hard errors encountered on writes for this disk Soft Read Errors Number of soft errors encountered on reads for this disk Soft Write Errors Number of soft errors e...

Page 57: ...n chronological order with the most recent messages at the end To display the entire log use raid5 d device getlog To display the newest n entries in the log starting with the oldest entry use raid5 d device getlog n To display the oldest n entries in the log starting with the oldest entry use raid5 d device getlog n Table 3 3 Output of raid5 getcrus Output Meaning FANA FANB Fan banks A and B VSCA...

Page 58: ...error code in brackets for example 0x47 that gives diagnostic information when it is available See getlog in Appendix B for explanations of these codes To clear the event log use raid5 d device clearlog Note You must be root to use the clearlog parameter Shutting Down the CHALLENGE RAID Storage System Follow these steps to shut down the CHALLENGE RAID storage system 1 If you are using storage syst...

Page 59: ...rt the CHALLENGE RAID storage system follow these steps 1 Turn on the storage system s power see Figure 3 3 The green power light on the front of the storage system turns on see Figure 3 4 and the fans rotate Figure 3 4 CHALLENGE RAID Indicator Lights Deskside Rack Back of storage system Deskside Rack Front of storage system Power light Service light green amber ...

Page 60: ...ke sure that the power for each SP is enabled Move the fan module s latch to the UNLOCK position as indicated in Figure 3 5 Figure 3 5 Unlocking the Fan Module 3 Swing open the fan module Caution To prevent thermal shutdown of the system never leave the fan module open more than two minutes Deskside Rack ...

Page 61: ... s power switch to the enable position as shown in Figure 3 6 Figure 3 6 Enabling an SP s Power 5 Close the fan module by closing the fan module and moving the module s latch to the LOCK position 6 Power on the CHALLENGE server s SP B SP A Deskside Rack SP A SP B ...

Page 62: ......

Page 63: ...y names The LUN is a hexadecimal number between 0 and F 15 decimal Unlike standard disks physical disk unit numbers LUNs lack a standard geometry Disk capacity is not a fixed quantity between disk array LUNs The effective geometry of a disk array LUN depends on the type of physical disks in the array and the number of physical disks in the LUN To group physical disks into RAID 0 RAID 1 RAID 1_0 or...

Page 64: ...inimum of three maximum 16 disks A RAID 1 bind requires two disks on separate buses A RAID 1_0 bind requires a minimum of four disks per bus The maximum is 16 disks grouped in sets of four on different buses A RAID 1_0 bind requires separate buses for each member of the image pair Select the disks in this order p1 s1 p2 s2 p3 s3 and so on A RAID 5 bind also requires separate buses each with five d...

Page 65: ...e information stored on the physical disk unit s stripe size Number of blocks per physical disk in a RAID stripe Default is 128 legal values are any number greater than 0 The smaller the stripe element size the more efficient the distribution of data read or written However if the stripe size is too small for a single host I O operation the operation requires accessing two stripes thus causing the...

Page 66: ...gical unit with a LUN number of 2 and a four hour maximum rebuild time with read cache enabled raid5 d sc4d2l0 bind r1 2 a2 b2 r 4 c read The following example binds disks A1 B1 C1 and D1 into a RAID 1_0 logical unit with a LUN number of 1 a four hour maximum rebuild time and a 128 block stripe size per physical disk with read cache enabled raid5 d sc4d2l0 bind r1_0 1 a1 b1 c1 d1 r 4 s 128 c read ...

Page 67: ... is enabled with raid5 getcontrol See Getting Information About Other Components in Chapter 3 of this guide Type RAID5 Stripe size 128 Capacity 0x10000 Current owner YES Auto trespass Disabled Auto assign Enabled Write cache Disabled Read cache Disabled Idle Threshold 0 Idle Delay Time 20 Write Aside Size 2048 Default Owner YES Rebuild Time 0 Read Hit Ratio 0 Write Hit Ratio 0 Prct Reads Forced Fl...

Page 68: ...city Number of sectors total for use by user Current owner YES if this SP owns the unit NO if it does not Set with chglun see Dual Interfaces Load Balancing and Device Names later in this chapter Auto trespass Always Disabled Auto assign Always Enabled Write Cache Enabled means this LUN is write caching otherwise Disabled Read Cache Enabled means this LUN is read caching otherwise Disabled Idle Th...

Page 69: ...hed the cache Prct Writes Forced Flushed Percentage of write requests that flushed the cache Prct Rebuilt Percentage complete during a rebuild Prct Bound Percentage complete during a bind Diskname State Enabled Binding etc same as for getdisk Diskname Reads Total number of reads this disk has done Diskname Writes Total number of writes this disk has done Diskname Blocks Read Total number of blocks...

Page 70: ...aching rw read and write caching The default is none d default owner Values are 1 change storage control processor ownership of LUN 0 don t change ownership If your storage system has dual SPs see Dual Interfaces Load Balancing and Device Names later in this chapter r rebuild time Maximum time in hours to rebuild a replacement disk Default is 4 hours legal values are any number greater than or equ...

Page 71: ...in four hours it does not change the default owner raid5 d sc4d2l0 chglun l 3 c write d 0 r 4 There is no output for the chglun parameter Errors are printed to stderr Dual Interfaces Load Balancing and Device Names If your storage system has two SPs split bus dual bus dual initiator or dual interface dual processor you can choose which disks to bind on each SP This flexibility lets you balance the...

Page 72: ......

Page 73: ... examining the cabinet fault light or by using the raid5 getdisk or raid5 getcrus command as explained in Chapter 3 in this guide you can replace the defective module and rebuild your data without powering off the CHALLENGE RAID storage system or interrupting user applications Caution Removing the wrong drive module can introduce an additional fault that shuts down the physical disk containing the...

Page 74: ... the failed module s ID use Figure 5 2 Caution Use only CHALLENGE RAID disk modules to replace failed disk modules Order them from the Silicon Graphics hotline 1 800 800 4SGI 1 800 800 4744 CHALLENGE RAID disk modules contain proprietary firmware that the storage system requires for correct functioning Using any other disks including those from other Silicon Graphics systems can cause failure of t...

Page 75: ...hassis that might mean the disk module itself has not failed Note If you are using storage system caching the system uses modules A0 B0 C0 D0 and E0 for its cache vault If one of these modules fails the storage system dumps its cache image to the remaining modules in the vault then it writes all dirty modified pages to disk and disables caching The cache status changes as indicated in the output o...

Page 76: ...r materials that naturally build up electrostatic charge such as foam packaging foam cups cellophane wrappers and similar materials The disk module is extremely sensitive to shock and vibration Even a slight jar can severely damage it Do not remove disk modules from their antistatic packaging until the exact moment that you are ready to install them Before removing a disk module from its antistati...

Page 77: ...phics Inc hotline 1 800 800 4SGI 1 800 800 4744 Use Table 5 1 as a guide to ordering replacement disk modules Caution Use only CHALLENGE RAID disk modules as replacements only they contain the correct device firmware Other disk modules even those from other Silicon Graphics equipment will not work Do not mix disk modules of different capacities within one array Table 5 1 Ordering Replacement Disk ...

Page 78: ...ter with the raid5 command Follow these steps 1 In an IRIX window use raid5 getagent to get the device name node number raid5 getagent 2 If necessary use raid5 getdisk to verify which the position of the failed disk 3 Use raid5 unbind to unbind the failed disk raid5 d device unbind lun number o In this syntax device is the device name as returned by getagent and lun number is the number of the log...

Page 79: ...e a disk module follow these steps 1 Verify that the suspected module has actually failed Caution If you remove the wrong disk module you introduce an additional fault that shuts down the physical disk containing the failed module In this situation the operating system software cannot access the physical disk until you initialize it again 2 Read Setting Up the Workplace for Replacing or Installing...

Page 80: ...o attach the clip on a rack storage system Figure 5 4 Attaching the ESD Clip to the ESD Bracket on a Rack Storage System 6 Put the wrist band around your wrist with the metal button against your skin 7 Make sure the disk has stopped spinning and the heads have unloaded ESD bracket Clip and wire of ESD band ESD bracket Clip and wire of ESD band ...

Page 81: ...ution Never remove more than one disk module at a time Warning When removing a disk module from an upper chassis assembly in a CHALLENGE RAID rack system make sure that you adequately balance the weight of the disk module 9 Supporting the disk module with your free hand pull it all the way out of the cabinet as shown in Figure 5 6 ESD wrist band ESD wrist band Deskside Rack ...

Page 82: ...rite it on the label for example A1 For the compartment ID numbers refer to Figure 5 2 or the slot matrix attached to the storage system when it was installed 11 Put the failed disk module in an antistatic bag and store it in a place where it will not be damaged Caution Before installing a replacement module wait at least 15 seconds after removing the failed module to allow the SP time to recogniz...

Page 83: ...tremely sensitive to shock and vibration Even a slight jar can severely damage it 2 On the label on the side of the disk module write the ID number for the compartment into which the drive is going for example A3 3 Engage the disk module s rail in the chassis rail slot as shown in Figure 5 7 Figure 5 7 Engaging the Disk Module Rail 4 Engage the disk module s guide in the chassis guide slot as show...

Page 84: ...isk Module Guide 5 Insert the disk module as shown in Figure 5 9 Make sure it is completely seated in the slot Figure 5 9 Inserting the Replacement Disk Module Disk module s guide Guide slot Disk module s guide Guide slot Deskside Rack ESD wrist band Deskside Rack ...

Page 85: ...ing the default rebuild period see Binding Disks Into RAID Units in Chapter 4 Updating the Disk Module Firmware After replacing a failed unbound disk module A0 B0 C0 or A3 update the firmware on the CHALLENGE RAID SP Follow these steps 1 Quiesce the bus disabling all applications Make sure that only the RAID agent is running 2 Type as root raid5 d device firmware usr raid5 flarecode bin Caution Yo...

Page 86: ...al processing can continue while you install disk modules in arrays of five This section explains ordering add on disk module arrays inserting the new disk module array creating device nodes and binding the disks Ordering Add On Disk Module Arrays Call the Silicon Graphics Inc hotline to order add on disk module arrays 1 800 800 4SGI 1 800 800 4744 Use Table 5 2 as a guide to ordering add on disk ...

Page 87: ...he new disk modules in their antistatic packaging within reach of the storage system 3 If you are using a wrist band attach its clip to the ESD bracket on the bottom of the storage system as shown in Figure 5 3 Put the wrist band around your wrist with the metal button against your skin 4 Locate the slots where you will install the add on disk modules see Figure 5 2 Warning Although you need not c...

Page 88: ...e drive is going You can either write the slot position on the label in the corresponding place on the matrix or make a check mark in the position to indicate the slot that the disk module occupies Figure 5 10 shows these two ways of labeling disk module A0 Figure 5 10 Marking the Label for Disk Module A0 For reference Figure 5 11 diagrams all disk module locations RACK TOWER RACK TOWER A0 or For ...

Page 89: ...remely sensitive to shock and vibration Even a slight jar can severely damage them Figure 5 12 Engaging the Disk Module Rail A0 B0 C0 D0 A2 B2 C2 D2 E2 E0 A1 B1 C1 D1 A3 B3 C3 D3 E3 E1 A0 B0 C0 D0 A1 B1 C1 D1 E1 E0 A2 B2 C2 D2 A3 B3 C3 D3 E3 E2 Deskside Chassis assembly in rack 5 to 20 disk modules in groups of 5 Disk module s rail Rail slot Disk module s rail Rail slot Deskside Rack ...

Page 90: ...slot as shown in Figure 5 13 Figure 5 13 Engaging the Disk Module Guide 10 Insert the disk module as shown in Figure 5 14 Make sure it is completely seated in the slot Figure 5 14 Inserting a Disk Module Disk module s guide Guide slot Disk module s guide Guide slot Deskside Rack Deskside Rack ...

Page 91: ...de aware of the new disks This section explains how to accomplish this without rebooting Also in a system with two storage control SPs which are used for primary and secondary paths both SPs must be made aware of the new disks Also the new disks must be bound into LUNs Follow these steps 1 Change to the dev directory cd dev 2 Type MAKE_VLUNS controller number target number This command creates the...

Page 92: ......

Page 93: ...ents can be replaced only by qualified Silicon Graphics System Service Engineers or other qualified service providers Only disk modules are owner replaceable or end user replaceable Chapter 5 provides instructions Call the Silicon Graphics hotline to order a replacement module 1 800 800 4SGI 1 800 800 4744 Table 6 1 lists Silicon Graphics marketing codes for replacement units for the CHALLENGE RAI...

Page 94: ...ly Note When the storage system shuts down the operating system loses contact with the physical disk units When the storage system starts up automatically you may need to reboot it to let the operating system access the physical disk units Fan Module Each CHALLENGE RAID storage system has one fan module containing six fans If any fan fails the fan fault light on the back of the fan module turns on...

Page 95: ...ging and Present fully charged or charging If the battery backup unit takes longer than an hour to charge it shuts itself off and transitions to the Faulted state If the fault light comes on or if the battery backup unit state is shown as Faulted in the raid5 getcrus command output have the battery backup unit replaced as soon as possible by a Silicon Graphics System Service Engineer Storage Contr...

Page 96: ...in the case of problems caused by a defective CHALLENGE to CHALLENGE RAID connection such as a damaged cable To enable auto reassign power off the failed SP You do not need to power down the CHALLENGE RAID storage system Caution Do not power off the SP under any circumstances other than to enable auto reassign Powering off an SP requires opening the fan module at the back of the storage system Bec...

Page 97: ...CHALLENGE RAID storage system move the fan module s latch to the UNLOCK position as shown in Figure 6 1 Figure 6 1 Unlocking the Fan Module 2 Swing open the fan module as shown in Figure 6 2 Figure 6 2 Opening the Fan Module Deskside Rack Deskside Rack ...

Page 98: ...ure 6 3 Figure 6 3 Disabling an SP s Power Caution Do not power off the SP under any circumstances other than to enable auto reassign 4 Immediately close and lock the fan module to let the SP cool 5 Have the SP replaced as soon as possible by qualified service personnel Leave the SP powered off in the meantime SP B SP A Deskside Rack SP A SP B ...

Page 99: ...8 MB of memory for each SP cache enabling using the raid5 setcache command as explained in this chapter disk modules in slots A0 B0 C0 D0 and E0 as a fast repository for cached data Caching cannot occur unless all these conditions are met This chapter explains setting cache parameters viewing cache statistics upgrading CHALLENGE RAID to support caching changing cache unit parameters ...

Page 100: ...es user has selected to enable the cache The command line interface does not let you specify more memory than you have If you specify less than you have the remaining memory is unused Note For caching both SPs must have the same amount of cache memory in order for caching to be preserved in the event of shutdown or other power loss p page Size in KB of pages into which to partition the cache Valid...

Page 101: ...atistics If you use storage system caching you can use the raid5 getcache command to get information on cache activity The information in this command s output particularly the percentage of cache hits may help you decide on the most efficient cache page size and whether a physical disk unit really benefits from caching Note If a disk module in location A0 B0 C0 D0 or E0 fails caching is disabled ...

Page 102: ...emory than you have If you specify less than you have the remaining memory is unused Page Size User specified page size in KB for caching for example 2 means a 2 KB page size Cache State Enabled SP is fully functional Disabled SP not capable of or configured for caching Synching SP is synchronizing its cache with the peer SP Enabling cache is in the process of becoming enabled Quiescing cache was ...

Page 103: ...ntly owned by SP B Unassigned Cache Pages Number of dirty cache pages not owned by A or B This can happen when a unit is broken and there are no disks to which to flush the dirty pages Read Hit Ratio Percent of read requests to the SP that can be satisfied from the cache without requiring disk access Write Hit Ratio Percent of write requests to the controller that can be satisfied with the cache w...

Page 104: ...t move a disk module to another slot unless it is absolutely necessary to do so Never move disk modules from slots A0 A3 B0 C0 D0 and E0 Set up caching on one SP at a time Once the necessary hardware components have been installed follow these steps to set up caching 1 Enable the read and write caches for the physical disk units that will use caching usr raid5 raid5 d device chglun l lun number re...

Page 105: ...relatively low To change the caching parameter for a physical disk unit follow these steps 1 Run raid5 getcache to determine if caching is enabled If it is enabled make sure both SPs are powered on run raid5 getcrus to find out 2 If caching is enabled disable it with raid5 d device setcache 0 3 Wait for the cache memory to be written to disk which may take several minutes Use raid5 getcache to che...

Page 106: ......

Page 107: ... 47 Hz to 63 Hz 9 0 A max at 100 VAC input Apparent power 900 VAC max True power 880 W max Connector Type L6 15R L6 15P Operating limits Ambient temperature Relative humidity Elevation Host dissipation Shock Vibration 10 degrees C to 38 degrees C 50 degrees F to 100 degrees F 20 to 80 noncondensing 2439 m 8000 ft 3168 x 103 J hr 3000 BTU hr max 3 g 11 ms 0 25 g peak 5 Hz to 500 Hz Nonoperating lim...

Page 108: ...b 1 2 kg 2 6 lb 5 4 kg 12 lb Service clearance Front Rear 81 3 cm 32 0 in 81 3 cm 32 0 in Buses External host bus Internal storage system buses Differential fast and wide SCSI 2 synchronous Five single ended SCSI buses Table A 2 CHALLENGE RAID Rack Specifications Classification Specification Value AC power requirements Cabinet voltage Current draw per chassis assembly Power consumption 200 VAC to ...

Page 109: ...F 24 C hr 43 2 degrees F hr 10 to 90 noncondensing 7625 m 25 010 ft Physical Cabinet dimensions Maximum cabinet weight See Table A 1 for weights of FRUs Height 180 34 cm 71 0 in Width 58 42 cm 23 0 in Depth 78 74 cm 31 0 in 407 96 kg 899 4 lb 4 chassis assemblies each with 20 disk modules 2 SPs 3 power supplies battery backup unit without packaging Service clearance Front access Rear and side acce...

Page 110: ......

Page 111: ...t statistics logging on the RAID storage control processor SP setting all log counters to 0 firmware Update the firmware on the CHALLENGE RAID SP getagent Get names and descriptions of devices controlled by the SP getcache Get information about the storage system caching environment getcontrol Get general system information getcrus Display status information on all system components such as the fa...

Page 112: ... and variables mean v Enables verbose return p Parses the raid5 command without calling the API If the string does not parse correctly an error message is printed to stderr otherwise there is no output d device Target RAID device Use raid5 getagent for a list of RAID devices This switch must be present for all raid5 management and configuration commands unless the environment variable indicates ot...

Page 113: ...t numbers LUNs lack a standard geometry Disk capacity is not a fixed quantity between disk array LUNs The effective geometry of a disk array LUN depends on the type of physical disks in the array and the number of physical disks in the LUN Note Although bind returns immediate status for a RAID device the bind does not complete for 45 to 60 minutes depending on system traffic Use getlun to monitor ...

Page 114: ...mum of four disks per bus The maximum is 16 disks grouped in sets of four on different buses A RAID 1_0 bind requires separate buses for each member of the image pair Select the disks in this order p1 s1 p2 s2 p3 s3 and so on A RAID 5 bind also requires separate buses each with five disk modules Legal RAID 5 bind configurations are a0 b0 c0 d0 e0 a1 b1 c1 d1 e1 a2 b2 c2 d2 e2 a3 b3 c3 d3 e3 A hot ...

Page 115: ...bution of data read or written However if the stripe size is too small for a single host I O operation the operation requires accessing two stripes thus causing the hardware to read and or write from two disk modules instead of one Generally it is best to use the smallest stripe element size that will rarely force access to another stripe The default stripe element size is 128 sectors The size sho...

Page 116: ...28 c read The following example binds A3 B3 C3 D3 and E3 into a RAID 0 logical unit with a LUN number of 3 and a 128 block stripe size per physical disk with read cache enabled raid5 d sc4d2l0 bind r0 3 a3 b3 c3 d3 e3 s 128 c read The following example binds disk E3 as a hot spare with a LUN number of 7 raid5 d sc4d2l0 bind hs 7 e3 There is no output for raid5 with the bind parameter Errors are pr...

Page 117: ...r of I Os that can be outstanding to a LUN and have the LUN still be considered idle Used to determine cache flush start time Legal values are any number greater than or equal to 0 t idle delay time Amount of time in 100 ms intervals that a unit must be below idle thresh to be considered idle Once a unit is considered idle any dirty pages in the cache can begin idle time flushing Legal values are ...

Page 118: ...all log counters to 0 To reset statistics logging use raid5 d device clearstats Note You must be root to use this parameter This command has no output firmware To update the firmware on the CHALLENGE RAID SP type as root raid5 d device firmware usr raid5 flarecode bin Note The bus must be quiesced with all applications disabled and only the RAID agent running You must use this command every time y...

Page 119: ...e API raid5 getagent Following is a sample output for one device normally the output would give information on all devices Name Disk Array Desc RAID5 Disk Array Node sc4d2l0 Signature 0xf3b51700 Peer Signature 0x657e0a00 Revision 7 12 4 SCSI ID 1 Prom Rev 0x0076100 SP Memory 64 Serial No 94 7240 808 Table B 2 summarizes entries in the raid5 getagent output Table B 2 Output of raid5 getagent Entry ...

Page 120: ... 90 SP A Cache Pages 2048 SP B Cache Pages 2047 Unassigned Cache Pages 0 Read Hit Ratio 82 Node The dev scsi entry that the agent uses as a path to the actual SCSI device this value must be entered by the user for every CLI command except getagent Signature Unique 32 bit identifier for the SP being accessed through Node Peer Signature Unique 32 bit identifier for the other SP in the chassis 0 if n...

Page 121: ...re memory than you have If you specify less than you have the remaining memory is unused Page Size User specified page size in KB for caching for example 2 means a 2 KB page size Cache State Enabled SP is fully functional Disabled SP not capable of or configured for caching Synching SP is synchronizing its cache with the peer SP Enabling cache is in the process of becoming enabled Quiescing cache ...

Page 122: ...ages currently owned by SP B Unassigned Cache Pages Number of dirty cache pages not owned by A or B This can happen when a unit is broken and there are no disks to which to flush the dirty pages Read Hit Ratio Percent of read requests to the SP that can be satisfied from the cache without requiring disk access Write Hit Ratio Percent of write requests to the controller that can be satisfied with t...

Page 123: ...ay of the week Friday System Time 12 43 54 In the getcontrol output statistics logging is always turned on by the agent hard errors are those returned to the host total reads and writes are the totals as seen by the SP system time is in 24 hour format getcrus For state information on every field replaceable unit in the CHALLENGE RAID storage system except disks in the disk modules use raid5 d devi...

Page 124: ...ion has the format bd where b is the bus on which the disk is located a through e and d is the device number 0 through 3 Figure B 1 diagrams disk module locations Table B 4 Output of raid5 getcrus Output Meaning FANA FANB Fan banks A and B VSCA Power supply voltage semi regulated converter VSCB Optional second power supply SPA Storage control processor SPB Optional second storage control processor...

Page 125: ...e following command gets information disk A2 raid5 d sc4d2l0 getdisk a2 A0 B0 C0 D0 A2 B2 C2 D2 E2 E0 A1 B1 C1 D1 A3 B3 C3 D3 E3 E1 A0 B0 C0 D0 A1 B1 C1 D1 E1 E0 A2 B2 C2 D2 A3 B3 C3 D3 E3 E2 Deskside Chassis assembly in rack 5 to 20 disk modules in groups of 5 ...

Page 126: ...ivate 0x00009000 A0 Bind Signature 0x1c4eb2bc A0 Hard Read Errors 0 A0 Hard Write Errors 0 A0 Soft Read Errors 0 A0 Soft Write Errors 0 A0 Read Retries 0 A0 Write Retries 0 A0 Remapped Sectors 0 A0 Number of Reads 1007602 A0 Number of Writes 1152057 Table B 5 interprets items in this output Table B 5 Output of raid5 getdisk Output Meaning Vendor Id Manufacturer of disk drive Product Id 2 1 GB disk...

Page 127: ...Serial Number Serial number from disk inquiry command Capacity Actual disk capacity in blocks Private Amount of physical disk reserved for private space Bind Signature Unique value assigned to each disk in a logical unit at bind time Hard Read Errors Number of hard errors encountered on reads for this disk Hard Write Errors Number of hard errors encountered on writes for this disk Soft Read Errors...

Page 128: ... with the oldest entry use raid5 d device getlog n To display the oldest n entries in the log starting with the oldest entry use raid5 d device getlog n Following is a possible output of the command raid5 getlog 5 12 17 94 09 59 51 A3 A07 Cru Powered Down 0x47 12 17 94 09 59 51 A3 608 Cru Ready 0x0 12 17 94 09 59 51 A3 603 Cru Rebuild Started 0x0 12 17 94 09 59 51 A3 604 Cru Rebuild Complete 0x0 1...

Page 129: ... 14 Incorrect number of chglun parameters 15 Unable to determine name of target host machine 16 Enable disable flag invalid 17 Invalid usable cache size 18 Invalid page size 19 Invalid watermark value 20 High watermark less than low watermark 21 No device name listed 22 Invalid idle threshold 23 Invalid idle delay 24 Invalid write aside size 25 Disks must be on separate buses for bind 26 The agent...

Page 130: ...tistics The following is a truncated output for a RAID 5 group of five disks Note Information on individual disks is not displayed unless statistics logging is enabled with raid5 getcontrol See getcontrol earlier in this appendix 27 LUN does not exist 28 LUN already exists 29 Cannot get current working directory for firmware command 50 Agent encountered an error during SCSI execution 51 Agent enco...

Page 131: ...Rebuild Time 0 Read Hit Ratio 0 Write Hit Ratio 0 Prct Reads Forced Flushed 0 Prct Writes Forced Flushed 0 Prct Rebuilt 100 Prct Bound 100 A0 Enabled A0 Reads 62667 A0 Writes 29248 A0 Blocks Read 3212517 A0 Blocks Written 471642 A0 Queue Max 26 A0 Queue Avg 1 A0 Avg Service Time 14 A0 Prct Idle 100 A0 Prct Busy 0 A0 Remapped Sectors 0 A0 Read Retries 50 A0 Write Retries 0 B0 Enabled B0 Reads 66946...

Page 132: ...n Always Enabled Write Cache Enabled means this LUN is write caching otherwise Disabled Read Cache Enabled means this LUN is read caching otherwise Disabled Idle Threshold Maximum number of I Os outstanding used to determine cache flush start time set with chglun Idle Delay Time Amount of time in 100 ms intervals that unit is below idle threshold set with chglun Write Aside Size Smallest write req...

Page 133: ...sk has done Diskname Writes Total number of writes this disk has done Diskname Blocks Read Total number of blocks this disk has read Diskname Blocks Written Total number of blocks this disk has written Diskname Queue Max Maximum number of I Os queued up to this drive Diskname Queue Avg Average number of I Os queued up to this drive Diskname Avg Service Time Average service time in milliseconds Dis...

Page 134: ...user has selected to enable the cache The command line interface does not let you specify more memory than you have If you specify less than you have the remaining memory is unused Note For caching both SPs must have the same amount of cache memory in order for caching to be preserved in the event of shutdown or other power loss p page Size in KB of pages into which to partition the cache Valid si...

Page 135: ...eter you must always disable the cache The following example enables the system cache with an 8 MB cache partitioned into 8 KB pages with a 50 low watermark value and a 75 high watermark value setcache d sc4d2l0 1 u 8 p 8 l 50 h 75 unbind The unbind parameter deconfigures physical disks from their current logical configuration into LUNs Caution This parameter destroys all data on the LUN disk grou...

Page 136: ...cal unit LUN to deconfigure o When raid5 unbind is entered a prompt appears asking the user for verification before the unbind is issued This flag disables this prompt This command has no output The following example destroys LUN 3 and frees its disks to be reconfigured with no prompting to the user unbind d sc4d2l0 3 o ...

Page 137: ...s 86 87 size 86 120 statistics 87 89 upgrading for 90 CHALLENGE server system xi 4 chassis 8 assemblies on one SCSI bus 4 front view 2 3 chglun 56 57 102 104 clearlog 44 104 clearstats 104 CLI see command line interface command line interface 21 22 97 122 component 5 10 getting information 42 43 identifying failed 79 82 replacement part numbers 79 configuration disk 49 57 storage system 23 33 basi...

Page 138: ...s lights 10 60 swapping 14 unbinding 64 unbound replacing and updating firmware 71 disk striping see striping dual bus dual initiator configuration 31 33 57 dual interface dual processor configuration 26 27 57 E electrostatic discharge damage ESD avoiding 62 environment variable 99 error codes 114 116 event log clearing 44 displaying 43 44 F fan module closing 47 opening 46 82 83 replacing 80 stat...

Page 139: ...rage systems 4 O operation 35 47 P parity data 13 19 power supply 80 replacing 80 power supply status 43 R RAID defined 1 level 0 14 level 1 15 level 1_0 16 17 level 5 18 19 levels binding 50 99 paper 1 raid5 37 97 122 parameters summarized 97 98 syntax 98 RaidAgentDevice environment variable 99 rebuild time 71 100 bind 51 chglun 56 103 restarting system 45 47 S SCSI 2 bus 18 and disk modules 9 le...

Page 140: ...configuration 28 30 57 status checking 36 light 36 storage control processor 6 7 failed and caching 87 powering off 84 replacing 81 84 status 43 striping 12 default size 12 RAID 1_0 16 17 RAID 5 18 19 size 51 101 system information 39 U unbind 64 121 V VSC see power supply 43 ...

Page 141: ......

Page 142: ...s to Reach Us The postcard opposite this page has space for your comments Write your comments on the postage paid card for your country then detach and mail it If your country is not listed either use the international card and apply the necessary postage or use electronic mail or FAX for your reply If electronic mail is available to you write your comments in an e mail message and mail it to eith...

Reviews: