background image

M
management network

A network that is primarily responsible for booting and installing the designated server and compute

nodes from the management server.

management server (MS)

An ESS node that hosts the ESS GUI and xCAT and is not connected to storage. It must be part of a

GPFS cluster. From a system management perspective, it is the central coordinator of the cluster. It

also serves as a client node in an ESS building block.

master encryption key (MEK)

A key that is used to encrypt other keys. See also encryption key.

maximum transmission unit (MTU)

The largest packet or frame, specified in octets (eight-bit bytes), that can be sent in a packet- or

frame-based network, such as the Internet. The TCP uses the MTU to determine the maximum size of

each packet in any transmission.

MEK

See master encryption key (MEK).

metadata

A data structure that contains access information about file data. Such structures include inodes,

indirect blocks, and directories. These data structures are not accessible to user applications.

MS

See management server (MS).

MTU

See maximum transmission unit (MTU).

N
Network File System (NFS)

A protocol (developed by Sun Microsystems, Incorporated) that allows any host in a network to gain

access to another host or netgroup and their file directories.

Network Shared Disk (NSD)

A component for cluster-wide disk naming and access.

NSD volume ID

A unique 16-digit hexadecimal number that is used to identify and access all NSDs.

node

An individual operating-system image within a cluster. Depending on the way in which the computer

system is partitioned, it can contain one or more nodes. In a Power Systems environment,

synonymous with logical partition.

node descriptor

A definition that indicates how ESS uses a node. Possible functions include: manager node, client

node, quorum node, and non-quorum node.

node number

A number that is generated and maintained by ESS as the cluster is created, and as nodes are added

to or deleted from the cluster.

node quorum

The minimum number of nodes that must be running in order for the daemon to start.

node quorum with tiebreaker disks

A form of quorum that allows ESS to run with as little as one quorum node available, as long as there

is access to a majority of the quorum disks.

non-quorum node

A node in a cluster that is not counted for the purposes of quorum determination.

Glossary  59

Summary of Contents for Elastic Storage System 3000

Page 1: ...IBM Elastic Storage System 3000 Version 6 0 1 Service Guide IBM SC28 3158 00...

Page 2: ...uct number 5765 DME IBM Spectrum Scale Data Access Edition for IBM ESS product number 5765 DAE IBM welcomes your comments see the topic How to submit your comments on page xi When you send information...

Page 3: ...nd replacing a drive blank 23 Removing and replacing a power supply unit 23 Removing and replacing a power interposer 26 Miscellaneous equipment specification MES instructions 28 ESS 3000 storage driv...

Page 4: ...Index 63 iv...

Page 5: ...nk orientation 23 6 Details of Power Supply Units in the management GUI 24 7 Features of a power supply unit 25 8 Removing the power supply unit 26 9 Sliding out the power interposer 27 10 Removing a...

Page 6: ...vi...

Page 7: ...ster component 1 4 Events for the Enclosure component 4 5 Events for the physical disk component 8 6 Events for the Recovery group component 11 7 Server events 12 8 Events for the virtual disk compone...

Page 8: ...viii...

Page 9: ...ed with the operating systems on which each IBM Spectrum Scale cluster is based Service Guide This unit provides ESS 3000 information including events servicing and parts listings System administrator...

Page 10: ...lic Italic words or characters represent variable values that you must supply Italics are also used for information unit titles for the first use of a glossary term and for general emphasis in text ke...

Page 11: ...How to submit your comments To contact the IBM Spectrum Scale development organization send your comments to the following email address scale us ibm com About this information xi...

Page 12: ...xii IBM Elastic Storage System 3000 Service Guide...

Page 13: ...red array 0 is ok The declustered array state is ok N A N A gnr_array_unknown STATE_CHANGE WARNIN G GNR declustered array 0 is in unknown state The declustered array state is unknown N A N A gnr_array...

Page 14: ...ssessment returns OK The tsplatformstat a command returns a PASSED in the selfAssessment field for the bootdrive N A can_fan_failed STATE_CHANG E WARNING Fan 0 is failed The fan state is failed The mm...

Page 15: ...in ess3kplt command returned an InspectionPasse d unequal to True value Check for specific events related to CPUs by using the mmhealth command Inspect the output of the ess3kplt command for details c...

Page 16: ...ct the output of the mmlsenclosure all L command for the referenced canister pair_canister_visible STATE_CHANG E INFO Pair canister 0 is visible Successfully get the state of the pair canister The mml...

Page 17: ...ive is correct N A N A drive_firmware_wrong STATE_CHANGE WARNIN G The firmware level of drive 0 is wrong The firmware level of the drive is wrong N A Check the installed firmware level using the mmlsf...

Page 18: ...ANGE INFO ESM 0 is ok The ESM state is ok N A N A expander_absent STATE_CHANGE WARNIN G expander 0 is absent The expander is absent N A N A expander_failed STATE_CHANGE ERROR expander 0 is failed The...

Page 19: ...iled The temperature sensor I2C bus has failed N A N A temp_high_critical STATE_CHANGE WARNIN G Temperature sensor 0 measured a high temperature value The temperature has exceeded the actual high crit...

Page 20: ...voltage sensor state is failed N A N A voltage_sensor_ok STATE_CHANGE INFO Voltage sensor 0 is ok The voltage sensor state is ok N A N A Physical disk events The following table lists the events that...

Page 21: ...tration commands like mmdeldisk The mmls pdis k com mand displa ys main tena nce user condi tion for the disk Complete the maintenance action Contact IBM support if you are not sure how to solve this...

Page 22: ...cale configuration was not detected A GNR pdisk listed in the IBM Spect rum Scale confi gurati on as moun ted befor e is not found This could be a valid situat ion Run the mmlspdisk command to verify...

Page 23: ...roup events The following table lists the events that are created for the Recovery group component Table 6 Events for the Recovery group component Event Event Type Severity Message Description Caus e...

Page 24: ...hardware state using xCAT The hardware part is ok None cpu_temperature_ok STATE_CHANGE INFO CPU 0 temperature is normal 1 The GUI checks the hardware state using xCAT The hardware part is ok None cpu_...

Page 25: ...R AUX Line 12V of Power Supply 0 failed The GUI checks the hardware state using xCAT The hardware part failed None server_power_supply_ fan_ok STATE_CHANGE INFO Fan of Power Supply 0 is ok The GUI ch...

Page 26: ...s the hardware state using xCAT The hardware part is ok None pci_failed STATE_CHANGE ERRO R PCI 0 failed The GUI checks the hardware state using xCAT The hardware part failed None fan_zone_ok STATE_CH...

Page 27: ..._ok STATE_CHANGE INFO All Power Supply Configurations of server 0 are ok The GUI checks the hardware state using xCAT The hardware part is ok None server_ps_conf_failed STATE_CHANGE ERRO R At least on...

Page 28: ...ecks the hardware state using xCAT The hardware part is ok None server_planar_failed STATE_CHANGE ERRO R Planar state of server 0 is unhealthy the voltage is too low or too high 1 The GUI checks the h...

Page 29: ...The vdisk state is degraded N A N A gnr_vdisk_found INFO_ADD_ENTI TY INFO GNR vdisk 0 was found A GNR vdisk listed in the IBM Spectrum Scale configuration was detected N A N A gnr_vdisk_offline STATE_...

Page 30: ...18 IBM Elastic Storage System 3000 Service Guide...

Page 31: ...ive You can also locate unhealthy drives in the management GUI either from the Storage Physical Disks page or from the list of events that are available under the Monitoring Events page You can also s...

Page 32: ...rity FRU type location BB01L e2s11 1 15 00W1240 Enclosure 2 Drive 11 BB01L e3s01 1 15 00W1240 Enclosure 3 Drive 1 mmvdisk A lower priority value means a higher need for replacement Preparing disks for...

Page 33: ...ing the drive 1 Ensure that the LED indicators are at the top of the drive 2 Press the blue touchpoint to unlock the latching handle on the new drive 3 Slide the new drive into the node canister as sh...

Page 34: ...sh replacing pdisk e2s11 with the new physical disk by running the following command mmvdisk pdisk replace recovery group BB01L pdisk e2s11 mmvdisk mmvdisk Preparing a new pdisk for use may take many...

Page 35: ...ked before you remove the existing drive slot filler No tools are required to complete this task Do not remove or loosen any screws 1 Unpack the replacement drive slot filler from its packaging Removi...

Page 36: ...from the Monitoring Hardware Details page as shown in the Figure 6 on page 24 Figure 6 Details of Power Supply Units in the management GUI Two sets of power supply units are available for each enclos...

Page 37: ...SURE DEGRADED 1 day ago power_supply_failed 78E021A 78E021A DEGRADED 1 day ago power_supply_failed 78E021A Event Parameter Severity Active Since Event Message power_supply_failed 78E021A WARNING Now P...

Page 38: ...the midplane It can only be removed after its PSU is removed from the rear of the enclosure Before you remove or replace a power interposer review the following guidelines for this procedure Ensure t...

Page 39: ...poser out until it is clear of the enclosure rear as shown in Figure 10 on page 27 Figure 10 Removing a power interposer Replacing the power interposer 4 Identify the correct empty power slot where th...

Page 40: ...t be at the ESS 5 3 5 1 or ESS 3000 6 0 0 1 level If the setup has any protocol nodes these nodes must also be upgraded to ESS 5 3 5 1 levels underlying code IBM Spectrum Scale 5 0 4 2 verified by usi...

Page 41: ...the automount is disabled on the file systems and the remote clusters 8 Issue the mmshutdown command on the ESS 3000 canister servers 9 Power off the ESS 3000 by removing the cables that are at the b...

Page 42: ...he third adapter to each of the server canisters of ESS 3000 The adapter options to choose from include EC64 InfiniBand and EC67 Ethernet Objectives Install the new adapter pair one in each server nod...

Page 43: ...ds 1 To get the information about the interfaces issue the following command ip a Copy and paste the interfaces information of existing adapters into a note for a later comparison 2 To get information...

Page 44: ...mshutdown N Node server names separated by a comma a Ensure that the node servers associated with the target ESS 3000 are shut down by issuing the following command mmgetstate a b Do the following ext...

Page 45: ...the existing network master bond customer task a Log in to each canister and issue the following commands 1 To get the information about the interfaces issue the following command ip a Ensure that the...

Page 46: ...ices are listed e Do the following extra steps on only one node canister if MES is for EC64 InfiniBand 1 To update the verbs port list first start GPFS manually 2 To identify the node class name assoc...

Page 47: ...wing command mmlsmount filesystem L c Confirm that one or more file systems are mounted by issuing the following command mmlsmount 10 Do health check by issuing the following command and resume I O be...

Page 48: ...mount settings on both server canisters during the MES process customer task a Log in as root to each canister and issue the following commands 1 To get the information about a GPFS cluster issue the...

Page 49: ...stic Storage System 3000 Service Guide or the Planning for hardware chapter of the IBM Elastic Storage System 3000 Hardware Planning and Installation Guide 6 Power on ESS 3000 and do basic checks SSR...

Page 50: ...le net no 754 GiB 75 GiB 131072 22 ess3k5b ib example net no 754 GiB 75 GiB 131072 Here you can see that the pagepool is less than 25 of physical memory c To change the pagepool percentage check that...

Page 51: ...that is associated with the target ESS 3000 by issuing the following command mmlsconfig 9 Mount the file system customer task a Mount each file system individually by issuing the following command mm...

Page 52: ...the resizing the original file system data goes to the four original NSDs Consider the necessity of restriping and the current demands on the system New data that is added to the file system is correc...

Page 53: ...the canisters to update the drive firmware mmchfirmware type drive a After the mmchfirmware command completes verify that the drive firmware levels are correct by issuing the following command again...

Page 54: ...g nodes mmvdisk ess3kb ib example net Important This command automatically stops and restarts GPFS on each canister server in a serial fashion by using the recycle 1 option If you do not want to stop...

Page 55: ...file system and attributes vs_ess3k_1 4 6152 GiB 7820 GiB no DA1 8 2p 4 MiB dataAndMetadata system declustered capacity all vdisk sets defined recovery group array type total raw free raw free in the...

Page 56: ...n manually stop and start GPFS to solidify the nodes configuration changes on both canisters For configuration changes to take effect restart GPFS on one canister at a time and ensure that at least on...

Page 57: ...GPFS is in the active state on both canisters mmgetstate N this ESS 3000 node class A sample output is as follows Node number Node name GPFS state 21 ess3ka ib active 22 ess3kb ib active 6 Repeat the...

Page 58: ...46 IBM Elastic Storage System 3000 Service Guide...

Page 59: ...3 84 TB 2 5 NVMe Flash drive 01LL513 7 68 TB 2 5 NVMe Flash drive 01LL514 15 36 TB 2 5 NVMe Flash drive 01LL515 Left Brand Bezel 01LL519 FRU part number list The FRU part numbers are listed in the ta...

Page 60: ...d in the table Cable part number list Table 11 Cable Part Numbers Description Part Number IB cbl 2 M 0000000RX861 CR2032 coin cell 0000000RY543 1 M EDR IB COPPER CABLE TRANSCEIVER QSFP QSFP 0000000WT0...

Page 61: ...ve copper 100Gb Ethernet cable 0000001FT718 1M QSFP28 passive copper 100Gb Ethernet cable 0000001FT719 1 5M QSFP28 passive copper 100Gb Ethernet cable 0000001FT720 2M QSFP28 passive copper 100Gb Ether...

Page 62: ...002CL470 ELC5 Power Cable Drawer to IBM PDU C13 C20 250V 10A 0000002EA542 6665 Power Cablem 9 2 ft Drawer to IBM PDU C13 C20 250V 10A 0000039M5392 6672 Power Cord M 6 5 foot Drawer to IBM PDU C13 C14...

Page 63: ...touch but do not activate just by touching them Industry standard devices for ports and connectors The attachment of alternative input and output devices IBM Knowledge Center and its related publicat...

Page 64: ...52 IBM Elastic Storage System 3000 Service Guide...

Page 65: ...ACHINES CORPORATION PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND EITHER EXPRESS OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF NON INFRINGEMENT MERCHANTABILITY OR FI...

Page 66: ...cation programs in source language which illustrate programming techniques on various operating platforms You may copy modify and distribute these sample programs in any form without payment to IBM fo...

Page 67: ...entral processor complex CPC central processor complex CPC A physical collection of hardware that consists of channels timers main storage and one or more central processors cluster A loosely coupled...

Page 68: ...system composed of one or more building blocks encryption key A mathematical value that allows components to verify that they are in communication with the expected server Encryption keys are based on...

Page 69: ...as a unit for balancing workload across a cluster See also dependent fileset independent fileset fileset snapshot A snapshot of an independent fileset plus all dependent filesets flexible service proc...

Page 70: ...uster IP See Internet Protocol IP IP over InfiniBand IPoIB Provides an IP network emulation layer on top of InfiniBand RDMA networks which allows existing applications to run over InfiniBand networks...

Page 71: ...unit MTU N Network File System NFS A protocol developed by Sun Microsystems Incorporated that allows any host in a network to gain access to another host or netgroup and their file directories Networ...

Page 72: ...tem data when a failure has occurred Recovery can involve reconstructing data or providing alternative routing through a different server recovery group RG A collection of disks that is set up by ESS...

Page 73: ...n that results from them SSH See secure shell SSH STP See Spanning Tree Protocol STP symmetric multiprocessing SMP A computer architecture that provides fast performance by making multiple processors...

Page 74: ...62 IBM Elastic Storage System 3000 Service Guide...

Page 75: ...ery group events 11 server events 12 virtual disk events 17 I IBM Elastic Storage System 3000 28 IBM Spectrum Scale events 1 4 8 11 12 17 RAS events 1 4 8 11 12 17 information overview ix L license in...

Page 76: ...64 IBM Elastic Storage System 3000 Service Guide...

Page 77: ...ery group events 11 server events 12 virtual disk events 17 I IBM Elastic Storage System 3000 28 IBM Spectrum Scale events 1 4 8 11 12 17 RAS events 1 4 8 11 12 17 information overview ix L license in...

Page 78: ...66 IBM Elastic Storage System 3000 Service Guide...

Page 79: ......

Page 80: ...IBM Product Number 5765 DME 5765 DAE SC28 3158 00...

Reviews: