background image

Table 4. Events for the Enclosure component (continued)

Event

Event Type

Severity

Message

Description

Cause

User Action

enclosure_firmware_wrong

STATE_CHANGE

WARNIN

G

The firmware level

of enclosure {0} is

wrong.

The firmware level

of the enclosure is

wrong.

N/A

Check the installed

firmware level using

mmlsfirmware

command.

enclosure_found

INFO_ADD_ENTITY

INFO

Enclosure {0} was

found.

A GNR enclosure

listed in the IBM

Spectrum Scale

configuration was

detected.

N/A

N/A

enclosure_needsservice

STATE_CHANGE

WARNIN

G

Enclosure {0}

needs service.

The enclosure

needs service.

N/A

N/A

enclosure_ok

STATE_CHANGE

INFO

Enclosure {0} is ok. The enclosure

state is ok.

N/A

N/A

enclosure_unknown

STATE_CHANGE

WARNIN

G

Enclosure state {0}

is unknown.

The enclosure

state is unknown.

N/A

N/A

enclosure_vanished

INFO_DELETE_ENTIT

Y

INFO

Enclosure {0} has

vanished.

A GNR enclosure

listed in the IBM

Spectrum Scale

configuration was

not detected.

A GNR enclosure,

listed in the IBM

Spectrum Scale

configuration as

mounted before, is

not found. This

could be a valid

situation.

Run the

mmlsenclosure

command to verify

that all expected

enclosures exist.

esm_absent

STATE_CHANGE

WARNIN

G

ESM {0} is absent.

The ESM state is

not installed .

N/A

N/A

esm_failed

STATE_CHANGE

WARNIN

G

ESM {0} is failed.

The ESM state is

failed.

N/A

N/A

esm_ok

STATE_CHANGE

INFO

ESM {0} is ok.

The ESM state is

ok.

N/A

N/A

expander_absent

STATE_CHANGE

WARNIN

G

expander {0} is

absent.

The expander is

absent.

N/A

N/A

expander_failed

STATE_CHANGE

ERROR

expander {0} is

failed.

The expander state

is failed.

N/A

N/A

expander_ok

STATE_CHANGE

INFO

expander {0} is ok.

The expander state

is ok.

N/A

N/A

fan_failed

STATE_CHANGE

WARNIN

G

Fan {0} is failed.

The fan state is

failed.

N/A

N/A

fan_ok

STATE_CHANGE

INFO

Fan {0} is ok.

The fan state is ok.

N/A

N/A

fan_speed_high

STATE_CHANGE

WARNIN

G

Fan {0} speed is

too high

The fan speed is

out of the tolerance

range

N/A

Check the enclosure

cooling module LEDs

for fan faults.

fan_speed_low

STATE_CHANGE

WARNIN

G

Fan {0} speed is

too low

The fan speed is

out of the tolerance

range

N/A

Check the enclosure

cooling module LEDs

for fan faults.

no_enclosure_data

STATE_CHANGE

WARNIN

G

Enclosure data and

state information

cannot be queried.

Cannot query the

enclosure details.

State reporting for

all enclosures and

canisters will be

incorrect.

The

mmlsenclosure

all -L -Y

command fails to

report any

enclosure data.

Run the

mmlsenclosure

command to check

for errors. Use the

lsmod

 command to

verify that the

pemsmod

 is loaded.

power_high_current

STATE_CHANGE

WARNIN

G

Power supply {0}

reports high

current.

The DC power

supply current is

greater than the

threshold.

N/A

N/A

power_high_voltage

STATE_CHANGE

WARNIN

G

Power supply {0}

reports high

voltage.

The DC power

supply voltage is

greater than the

threshold.

N/A

N/A

6  IBM Elastic Storage System 3000: Service Guide

Summary of Contents for Elastic Storage System 3000

Page 1: ...IBM Elastic Storage System 3000 Version 6 0 1 Service Guide IBM SC28 3158 00...

Page 2: ...uct number 5765 DME IBM Spectrum Scale Data Access Edition for IBM ESS product number 5765 DAE IBM welcomes your comments see the topic How to submit your comments on page xi When you send information...

Page 3: ...nd replacing a drive blank 23 Removing and replacing a power supply unit 23 Removing and replacing a power interposer 26 Miscellaneous equipment specification MES instructions 28 ESS 3000 storage driv...

Page 4: ...Index 63 iv...

Page 5: ...nk orientation 23 6 Details of Power Supply Units in the management GUI 24 7 Features of a power supply unit 25 8 Removing the power supply unit 26 9 Sliding out the power interposer 27 10 Removing a...

Page 6: ...vi...

Page 7: ...ster component 1 4 Events for the Enclosure component 4 5 Events for the physical disk component 8 6 Events for the Recovery group component 11 7 Server events 12 8 Events for the virtual disk compone...

Page 8: ...viii...

Page 9: ...ed with the operating systems on which each IBM Spectrum Scale cluster is based Service Guide This unit provides ESS 3000 information including events servicing and parts listings System administrator...

Page 10: ...lic Italic words or characters represent variable values that you must supply Italics are also used for information unit titles for the first use of a glossary term and for general emphasis in text ke...

Page 11: ...How to submit your comments To contact the IBM Spectrum Scale development organization send your comments to the following email address scale us ibm com About this information xi...

Page 12: ...xii IBM Elastic Storage System 3000 Service Guide...

Page 13: ...red array 0 is ok The declustered array state is ok N A N A gnr_array_unknown STATE_CHANGE WARNIN G GNR declustered array 0 is in unknown state The declustered array state is unknown N A N A gnr_array...

Page 14: ...ssessment returns OK The tsplatformstat a command returns a PASSED in the selfAssessment field for the bootdrive N A can_fan_failed STATE_CHANG E WARNING Fan 0 is failed The fan state is failed The mm...

Page 15: ...in ess3kplt command returned an InspectionPasse d unequal to True value Check for specific events related to CPUs by using the mmhealth command Inspect the output of the ess3kplt command for details c...

Page 16: ...ct the output of the mmlsenclosure all L command for the referenced canister pair_canister_visible STATE_CHANG E INFO Pair canister 0 is visible Successfully get the state of the pair canister The mml...

Page 17: ...ive is correct N A N A drive_firmware_wrong STATE_CHANGE WARNIN G The firmware level of drive 0 is wrong The firmware level of the drive is wrong N A Check the installed firmware level using the mmlsf...

Page 18: ...ANGE INFO ESM 0 is ok The ESM state is ok N A N A expander_absent STATE_CHANGE WARNIN G expander 0 is absent The expander is absent N A N A expander_failed STATE_CHANGE ERROR expander 0 is failed The...

Page 19: ...iled The temperature sensor I2C bus has failed N A N A temp_high_critical STATE_CHANGE WARNIN G Temperature sensor 0 measured a high temperature value The temperature has exceeded the actual high crit...

Page 20: ...voltage sensor state is failed N A N A voltage_sensor_ok STATE_CHANGE INFO Voltage sensor 0 is ok The voltage sensor state is ok N A N A Physical disk events The following table lists the events that...

Page 21: ...tration commands like mmdeldisk The mmls pdis k com mand displa ys main tena nce user condi tion for the disk Complete the maintenance action Contact IBM support if you are not sure how to solve this...

Page 22: ...cale configuration was not detected A GNR pdisk listed in the IBM Spect rum Scale confi gurati on as moun ted befor e is not found This could be a valid situat ion Run the mmlspdisk command to verify...

Page 23: ...roup events The following table lists the events that are created for the Recovery group component Table 6 Events for the Recovery group component Event Event Type Severity Message Description Caus e...

Page 24: ...hardware state using xCAT The hardware part is ok None cpu_temperature_ok STATE_CHANGE INFO CPU 0 temperature is normal 1 The GUI checks the hardware state using xCAT The hardware part is ok None cpu_...

Page 25: ...R AUX Line 12V of Power Supply 0 failed The GUI checks the hardware state using xCAT The hardware part failed None server_power_supply_ fan_ok STATE_CHANGE INFO Fan of Power Supply 0 is ok The GUI ch...

Page 26: ...s the hardware state using xCAT The hardware part is ok None pci_failed STATE_CHANGE ERRO R PCI 0 failed The GUI checks the hardware state using xCAT The hardware part failed None fan_zone_ok STATE_CH...

Page 27: ..._ok STATE_CHANGE INFO All Power Supply Configurations of server 0 are ok The GUI checks the hardware state using xCAT The hardware part is ok None server_ps_conf_failed STATE_CHANGE ERRO R At least on...

Page 28: ...ecks the hardware state using xCAT The hardware part is ok None server_planar_failed STATE_CHANGE ERRO R Planar state of server 0 is unhealthy the voltage is too low or too high 1 The GUI checks the h...

Page 29: ...The vdisk state is degraded N A N A gnr_vdisk_found INFO_ADD_ENTI TY INFO GNR vdisk 0 was found A GNR vdisk listed in the IBM Spectrum Scale configuration was detected N A N A gnr_vdisk_offline STATE_...

Page 30: ...18 IBM Elastic Storage System 3000 Service Guide...

Page 31: ...ive You can also locate unhealthy drives in the management GUI either from the Storage Physical Disks page or from the list of events that are available under the Monitoring Events page You can also s...

Page 32: ...rity FRU type location BB01L e2s11 1 15 00W1240 Enclosure 2 Drive 11 BB01L e3s01 1 15 00W1240 Enclosure 3 Drive 1 mmvdisk A lower priority value means a higher need for replacement Preparing disks for...

Page 33: ...ing the drive 1 Ensure that the LED indicators are at the top of the drive 2 Press the blue touchpoint to unlock the latching handle on the new drive 3 Slide the new drive into the node canister as sh...

Page 34: ...sh replacing pdisk e2s11 with the new physical disk by running the following command mmvdisk pdisk replace recovery group BB01L pdisk e2s11 mmvdisk mmvdisk Preparing a new pdisk for use may take many...

Page 35: ...ked before you remove the existing drive slot filler No tools are required to complete this task Do not remove or loosen any screws 1 Unpack the replacement drive slot filler from its packaging Removi...

Page 36: ...from the Monitoring Hardware Details page as shown in the Figure 6 on page 24 Figure 6 Details of Power Supply Units in the management GUI Two sets of power supply units are available for each enclos...

Page 37: ...SURE DEGRADED 1 day ago power_supply_failed 78E021A 78E021A DEGRADED 1 day ago power_supply_failed 78E021A Event Parameter Severity Active Since Event Message power_supply_failed 78E021A WARNING Now P...

Page 38: ...the midplane It can only be removed after its PSU is removed from the rear of the enclosure Before you remove or replace a power interposer review the following guidelines for this procedure Ensure t...

Page 39: ...poser out until it is clear of the enclosure rear as shown in Figure 10 on page 27 Figure 10 Removing a power interposer Replacing the power interposer 4 Identify the correct empty power slot where th...

Page 40: ...t be at the ESS 5 3 5 1 or ESS 3000 6 0 0 1 level If the setup has any protocol nodes these nodes must also be upgraded to ESS 5 3 5 1 levels underlying code IBM Spectrum Scale 5 0 4 2 verified by usi...

Page 41: ...the automount is disabled on the file systems and the remote clusters 8 Issue the mmshutdown command on the ESS 3000 canister servers 9 Power off the ESS 3000 by removing the cables that are at the b...

Page 42: ...he third adapter to each of the server canisters of ESS 3000 The adapter options to choose from include EC64 InfiniBand and EC67 Ethernet Objectives Install the new adapter pair one in each server nod...

Page 43: ...ds 1 To get the information about the interfaces issue the following command ip a Copy and paste the interfaces information of existing adapters into a note for a later comparison 2 To get information...

Page 44: ...mshutdown N Node server names separated by a comma a Ensure that the node servers associated with the target ESS 3000 are shut down by issuing the following command mmgetstate a b Do the following ext...

Page 45: ...the existing network master bond customer task a Log in to each canister and issue the following commands 1 To get the information about the interfaces issue the following command ip a Ensure that the...

Page 46: ...ices are listed e Do the following extra steps on only one node canister if MES is for EC64 InfiniBand 1 To update the verbs port list first start GPFS manually 2 To identify the node class name assoc...

Page 47: ...wing command mmlsmount filesystem L c Confirm that one or more file systems are mounted by issuing the following command mmlsmount 10 Do health check by issuing the following command and resume I O be...

Page 48: ...mount settings on both server canisters during the MES process customer task a Log in as root to each canister and issue the following commands 1 To get the information about a GPFS cluster issue the...

Page 49: ...stic Storage System 3000 Service Guide or the Planning for hardware chapter of the IBM Elastic Storage System 3000 Hardware Planning and Installation Guide 6 Power on ESS 3000 and do basic checks SSR...

Page 50: ...le net no 754 GiB 75 GiB 131072 22 ess3k5b ib example net no 754 GiB 75 GiB 131072 Here you can see that the pagepool is less than 25 of physical memory c To change the pagepool percentage check that...

Page 51: ...that is associated with the target ESS 3000 by issuing the following command mmlsconfig 9 Mount the file system customer task a Mount each file system individually by issuing the following command mm...

Page 52: ...the resizing the original file system data goes to the four original NSDs Consider the necessity of restriping and the current demands on the system New data that is added to the file system is correc...

Page 53: ...the canisters to update the drive firmware mmchfirmware type drive a After the mmchfirmware command completes verify that the drive firmware levels are correct by issuing the following command again...

Page 54: ...g nodes mmvdisk ess3kb ib example net Important This command automatically stops and restarts GPFS on each canister server in a serial fashion by using the recycle 1 option If you do not want to stop...

Page 55: ...file system and attributes vs_ess3k_1 4 6152 GiB 7820 GiB no DA1 8 2p 4 MiB dataAndMetadata system declustered capacity all vdisk sets defined recovery group array type total raw free raw free in the...

Page 56: ...n manually stop and start GPFS to solidify the nodes configuration changes on both canisters For configuration changes to take effect restart GPFS on one canister at a time and ensure that at least on...

Page 57: ...GPFS is in the active state on both canisters mmgetstate N this ESS 3000 node class A sample output is as follows Node number Node name GPFS state 21 ess3ka ib active 22 ess3kb ib active 6 Repeat the...

Page 58: ...46 IBM Elastic Storage System 3000 Service Guide...

Page 59: ...3 84 TB 2 5 NVMe Flash drive 01LL513 7 68 TB 2 5 NVMe Flash drive 01LL514 15 36 TB 2 5 NVMe Flash drive 01LL515 Left Brand Bezel 01LL519 FRU part number list The FRU part numbers are listed in the ta...

Page 60: ...d in the table Cable part number list Table 11 Cable Part Numbers Description Part Number IB cbl 2 M 0000000RX861 CR2032 coin cell 0000000RY543 1 M EDR IB COPPER CABLE TRANSCEIVER QSFP QSFP 0000000WT0...

Page 61: ...ve copper 100Gb Ethernet cable 0000001FT718 1M QSFP28 passive copper 100Gb Ethernet cable 0000001FT719 1 5M QSFP28 passive copper 100Gb Ethernet cable 0000001FT720 2M QSFP28 passive copper 100Gb Ether...

Page 62: ...002CL470 ELC5 Power Cable Drawer to IBM PDU C13 C20 250V 10A 0000002EA542 6665 Power Cablem 9 2 ft Drawer to IBM PDU C13 C20 250V 10A 0000039M5392 6672 Power Cord M 6 5 foot Drawer to IBM PDU C13 C14...

Page 63: ...touch but do not activate just by touching them Industry standard devices for ports and connectors The attachment of alternative input and output devices IBM Knowledge Center and its related publicat...

Page 64: ...52 IBM Elastic Storage System 3000 Service Guide...

Page 65: ...ACHINES CORPORATION PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND EITHER EXPRESS OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF NON INFRINGEMENT MERCHANTABILITY OR FI...

Page 66: ...cation programs in source language which illustrate programming techniques on various operating platforms You may copy modify and distribute these sample programs in any form without payment to IBM fo...

Page 67: ...entral processor complex CPC central processor complex CPC A physical collection of hardware that consists of channels timers main storage and one or more central processors cluster A loosely coupled...

Page 68: ...system composed of one or more building blocks encryption key A mathematical value that allows components to verify that they are in communication with the expected server Encryption keys are based on...

Page 69: ...as a unit for balancing workload across a cluster See also dependent fileset independent fileset fileset snapshot A snapshot of an independent fileset plus all dependent filesets flexible service proc...

Page 70: ...uster IP See Internet Protocol IP IP over InfiniBand IPoIB Provides an IP network emulation layer on top of InfiniBand RDMA networks which allows existing applications to run over InfiniBand networks...

Page 71: ...unit MTU N Network File System NFS A protocol developed by Sun Microsystems Incorporated that allows any host in a network to gain access to another host or netgroup and their file directories Networ...

Page 72: ...tem data when a failure has occurred Recovery can involve reconstructing data or providing alternative routing through a different server recovery group RG A collection of disks that is set up by ESS...

Page 73: ...n that results from them SSH See secure shell SSH STP See Spanning Tree Protocol STP symmetric multiprocessing SMP A computer architecture that provides fast performance by making multiple processors...

Page 74: ...62 IBM Elastic Storage System 3000 Service Guide...

Page 75: ...ery group events 11 server events 12 virtual disk events 17 I IBM Elastic Storage System 3000 28 IBM Spectrum Scale events 1 4 8 11 12 17 RAS events 1 4 8 11 12 17 information overview ix L license in...

Page 76: ...64 IBM Elastic Storage System 3000 Service Guide...

Page 77: ...ery group events 11 server events 12 virtual disk events 17 I IBM Elastic Storage System 3000 28 IBM Spectrum Scale events 1 4 8 11 12 17 RAS events 1 4 8 11 12 17 information overview ix L license in...

Page 78: ...66 IBM Elastic Storage System 3000 Service Guide...

Page 79: ......

Page 80: ...IBM Product Number 5765 DME 5765 DAE SC28 3158 00...

Reviews: