background image

O
OFED

See OpenFabrics Enterprise Distribution (OFED).

OpenFabrics Enterprise Distribution (OFED)

An open-source software stack includes software drivers, core kernel code, middleware, and user-

level interfaces.

P
pdisk

A physical disk.

PortFast

A Cisco network function that can be configured to resolve any problems that could be caused by the

amount of time STP takes to transition ports to the Forwarding state.

R
RAID

See redundant array of independent disks (RAID).

RDMA

See remote direct memory access (RDMA).

redundant array of independent disks (RAID)

A collection of two or more disk physical drives that present to the host an image of one or more

logical disk drives. In the event of a single physical device failure, the data can be read or regenerated

from the other disk drives in the array due to data redundancy.

recovery

The process of restoring access to file system data when a failure has occurred. Recovery can involve

reconstructing data or providing alternative routing through a different server.

recovery group (RG)

A collection of disks that is set up by ESS, in which each disk is connected physically to two servers: a

primary server and a backup server.

remote direct memory access (RDMA)

A direct memory access from the memory of one computer into that of another without involving

either one's operating system. This permits high-throughput, low-latency networking, which is

especially useful in massively-parallel computer clusters.

RGD

See recovery group data (RGD).

remote key management server (RKM server)

A server that is used to store master encryption keys.

RG

See recovery group (RG).

recovery group data (RGD)

Data that is associated with a recovery group.

RKM server

See remote key management server (RKM server).

S
SAS

See Serial Attached SCSI (SAS).

secure shell (SSH)

A cryptographic (encrypted) network protocol for initiating text-based shell sessions securely on

remote computers.

58  IBM Elastic Storage System 5000: Hardware Guide

Summary of Contents for Ambra Achiever 5000

Page 1: ...IBM Elastic Storage System 5000 Version 6 0 1 Hardware Guide IBM SC28 3155 00 ...

Page 2: ... product number 5765 DME IBM Spectrum Scale Data Access Edition for IBM ESS product number 5765 DAE IBM welcomes your comments see the topic How to submit your comments on page xi When you send information to IBM you grant IBM a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you Copyright International Business Machine...

Page 3: ...stries Association JEITA Notice 12 Japan Voluntary Control Council for Interference VCCI Notice 12 Korea Notice 13 People s Republic of China Notice 13 Russia Notice 13 Taiwan Notice 13 United States Federal Communications Commission FCC Notice 13 Chapter 2 System overview 15 High level architecture 15 Servers 16 ESS 5000 product line up 18 Enclosures 22 Network topology 23 Operating system and so...

Page 4: ...ment IP address 43 Appendix A Planning worksheets customer task 45 Installation worksheet 45 Appendix B 5105 22E Reference information 49 Accessibility features for the system 51 Accessibility features 51 Keyboard navigation 51 IBM and accessibility 51 Glossary 53 Index 61 Index 61 iv ...

Page 5: ...es 21 8 Raw capacity of ESS 5000 SL variants 21 9 Model 106 expansion enclosure rear view 22 10 Model 106 expansion enclosure disk locations 22 11 Model 092 expansion enclosure rear view 23 12 Model 092 expansion enclosure disk locations 23 13 ESS 5000 network topology 24 14 I O node P2P cabling 24 15 Protocol node P2P cabling 25 16 EMS server P2P cabling 25 17 SAS HBA port schema 25 18 ESS 5000 n...

Page 6: ...vi ...

Page 7: ...Tables 1 Conventions x vii ...

Page 8: ...viii ...

Page 9: ...ncluding initial hardware installation and setup and removal and installation of field replaceable units FRUs customer replaceable units CRUs for ESS 5000 Expansion Model 092 5147 092 System administrators and IBM support team Model 106 storage enclosures This unit provides information including hardware installation and maintenance for ESS 5000 Expansion Model 106 System administrators and IBM su...

Page 10: ...appear in constant width typeface Depending on the context constant width typeface sometimes represents path names directories or file names italic Italic words or characters represent variable values that you must supply Italics are also used for information unit titles for the first use of a glossary term and for general emphasis in text key Angle brackets less than and greater than enclose the ...

Page 11: ...n other words a vertical line means Or In the left margin of the document vertical lines indicate technical changes to the information How to submit your comments To contact the IBM Spectrum Scale development organization send your comments to the following email address scale us ibm com About this information xi ...

Page 12: ...xii IBM Elastic Storage System 5000 Hardware Guide ...

Page 13: ...w IBM Japan Ltd 19 21 Nihonbashi Hakozakicho Chuo ku Tokyo 103 8510 Japan INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND EITHER EXPRESS OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF NON INFRINGEMENT MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE Some jurisdictions do not allow disclaimer of express or implied warran...

Page 14: ...programs in source language which illustrate programming techniques on various operating platforms You may copy modify and distribute these sample programs in any form without payment to IBM for the purposes of developing using marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written Thes...

Page 15: ...njury D002 2 Locate the IBM Systems Safety Notices with the user publications that were provided with your system hardware 3 Find the matching identification number in the IBM Systems Safety Notices Then review the topics about the safety notices to ensure that you are in compliance 4 Optional Read the multilingual safety instructions on the system website a Go to www ibm com support b Search for ...

Page 16: ...cs about the safety notices to ensure that you are in compliance Danger notices for the system Ensure that you are familiar with the danger notices for your system Use the reference numbers in parentheses at the end of each notice for example D005 to find the matching translated notice in IBM Systems Safety Notices DANGER When working on or around the system observe the following precautions Elect...

Page 17: ...ss it is properly packaged secured on top of the supplied pallet R004 DANGER Main Protective Earth Ground This symbol is marked on the frame of the rack The PROTECTIVE EARTHING CONDUCTORS should be terminated at that point A recognized or certified closed loop connector ring terminal should be used and secured to the frame with a lock washer using a bolt or stud The connector should be properly si...

Page 18: ...upport Local paper manual must remain with machine in provided storage sleeve area Latest revision manual available on vendor s website Test verify stabilizer brake function before each use Do not over force moving or rolling the LIFT TOOL with stabilizer brake engaged Do not raise lower or slide platform load shelf unless stabilizer brake pedal jack is fully engaged Keep stabilizer brake engaged ...

Page 19: ... is being raised Be sure winch is locked in position before releasing handle Read instruction page before operating this winch Never allow winch to unwind freely Freewheeling will cause uneven cable wrapping around winch drum damage cable and may cause serious injury C048 part 2 of 2 CAUTION High levels of acoustical noise are or could be under certain circumstances present Use approved hearing pr...

Page 20: ...0 follow general safety guidelines Use the following general rules to ensure safety to yourself and others Observe good housekeeping in the area where the devices are kept during and after maintenance Follow the guidelines when lifting any heavy object 1 Ensure that you can stand safely without slipping 2 Distribute the weight of the object equally between your feet 3 Use a slow lifting force Neve...

Page 21: ...us injury To inspect each node for unsafe conditions use the following steps If necessary see any suitable safety publications 1 Turn off the system and disconnect the power cord 2 Check the frame for damage loose broken or sharp edges 3 Check the power cables by using the following steps a Ensure that the third wire ground connector is in good condition Use a meter to check that the third wire gr...

Page 22: ...tions Limit your movement Movement can cause static electricity to build up around you Handle the device carefully holding it by its edges or frame Do not touch solder joints pins or exposed printed circuitry Do not leave the device where others can handle and possibly damage the device While the device is still in its antistatic bag touch it to an unpainted metal part of the system unit for at le...

Page 23: ...ie 2014 30 EU zur Angleichung der Rechtsvorschriften über die elektromagnetische Verträglichkeit in den EU Mitgliedsstaatenund hält die Grenzwerte der EN 55032 Klasse A ein Um dieses sicherzustellen sind die Geräte wie in den Handbüchern beschrieben zu installieren und zu betreiben Des Weiteren dürfen auch nur von der IBM empfohlene Kabel angeschlossen werden IBM übernimmt keine Verantwortung für ...

Page 24: ...chutzanforderungen nach EN 55024 und EN 55032 Klasse A Japan Electronics and Information Technology Industries Association JEITA Notice This statement applies to products less than or equal to 20 A per phase This statement applies to products greater than 20 A single phase This statement applies to products greater than 20 A per phase three phase Japan Voluntary Control Council for Interference VC...

Page 25: ...ommission FCC Notice This equipment has been tested and found to comply with the limits for a Class A digital device pursuant to Part 15 of the FCC Rules These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment This equipment generates Chapter 1 Notices 13 ...

Page 26: ...erference caused by using other than recommended cables and connectors or by unauthorized changes or modifications to this equipment Unauthorized changes or modifications could void the user s authority to operate the equipment This device complies with Part 15 of the FCC Rules Operation is subject to the following two conditions 1 this device might not cause harmful interference and 2 this device...

Page 27: ...s sophisticated data placement and error correction algorithms to deliver high levels of storage reliability availability and performance Standard GPFS file systems are created from the NSDs that are defined through IBM Spectrum Scale RAID The following figure illustrates the high level software architecture of the ESS 5000 system Figure 1 ESS 5000 high level software architecture For more informa...

Page 28: ...are having the same MTM 5105 22E ESS management server The ESS management server EMS manages and deploys the I O servers and hosts the graphical user interface GUI The specifications of the EMS server are as follows 1x DD2 3 20 Small Cores 190W 225W 2 5GHz 2 9GHz 128 GB default memory no NVDIMMs No HBAs The same NIC fabric options as the NSD server C9 C6 C12 C7 2x SFF HDD capacity 1 8 TB The ESS 5...

Page 29: ...used NVLink not used PCIe slots 5x gen3 X8 HBA Broadcom 9305 16e fc ESA5 C2 C6 C7 C8 C12 3x gen4 x16 CX5 C3 C9 C4 1x gen3 x4 Austin Ethernet management C11 See I O matrix for full list Protocol node The ESS access methods are like the ones for accessing an IBM Spectrum Scale cluster Depending on the required configuration the ESS can be accessed as an IBM Spectrum Scale client with a specific conn...

Page 30: ...es ESS 5000 SC series The main features of the ESS 5000 SC series are as follows Uses the 5147 106 enclosures Uses up to six network adapters per ESS system 25G Ethernet 100G Ethernet 100G InfiniBand HDD options available are 10 TB 14 TB and 16 TB The following figure shows the basic building block of the ESS 5000 system with Model 106 expansion enclosures 18 IBM Elastic Storage System 5000 Hardwa...

Page 31: ... ESS 5000 system with Model 106 expansion enclosures The following figure shows different configurations that are available under the raw capacity of ESS 5000 SC variants SC series Chapter 2 System overview 19 ...

Page 32: ...enclosures Provides igh capacity and performance Uses up to six network adapters per ESS system 25G Ethernet 100G Ethernet 100G InfiniBand Four HDD options 6 TB 10 TB 14 TB and 16 TB The following figure shows the basic building block of the ESS 5000 system with Model 092 expansion enclosures 20 IBM Elastic Storage System 5000 Hardware Guide ...

Page 33: ... 092 expansion enclosures The following figure shows the different configurations that are available under the ESS 5000 SL series and their raw capacities Figure 8 Raw capacity of ESS 5000 SL variants Chapter 2 System overview 21 ...

Page 34: ...ies IBM Elastic Storage System 5000 Expansion Model 106 5147 106 The Model 106 has a 4U chassis It holds up to 106 low profile 1 inch high 3 5 inch form factor disk drive modules in a vertical orientation Alternatively disk slots can hold a low profile 5 8 inch high 2 5 inch form factor disk with an adapter within the large form factor carrier The following figure shows the rear portion of the 514...

Page 35: ...h SAS disk drives in a 5U 19 inch rack mount enclosure The following figure shows the rear portion of the 5147 092 enclosure Figure 11 Model 092 expansion enclosure rear view The following figure shows the disk locations in the Model 092 expansion enclosure Figure 12 Model 092 expansion enclosure disk locations Network topology An Elastic Storage Server network includes the following components Ch...

Page 36: ...MC Service or FSP network The EMS requires this network to power control the POWER9 servers and to collect Call Home data Figure 13 on page 24 shows the ESS 5000 network topology of a mixed ESS 5000 cluster which contains ESS 3000 and IBM POWER9 servers Figure 13 ESS 5000 network topology Note Although connecting EMS to the clustering network is optional if there is only one building block the EMS...

Page 37: ...ESS management server Figure 16 on page 25 shows the EMS server P2P cabling Figure 16 EMS server P2P cabling SAS HBA port schema Figure 17 on page 25 shows the SAS HBA port schema Figure 17 SAS HBA port schema Chapter 2 System overview 25 ...

Page 38: ...s The major reliability availability and serviceability RAS features are as follows Disk Hospital Call home Concurrent maintenance No server adapters uses failover HDD SSD Redundant cooling Redundant power supply 1 1 1400W PS 200 240 VAC JBOD drives and select components Enclosure only with enclosure protection Warranty information See the Warranty Information PDF for details of the warranty 26 IB...

Page 39: ... racked system configuration is as follows ESS 5000 management server EMS 5105 22E Two or more ESS 5000 5105 22E protocol nodes optional One or more ESS 5000 building blocks 5105 22E Building block consists of two ESS 5000 I O server nodes and one or more storage enclosures Solution offerings are SC Model 106 and SL Model 092 series 1 Gb management switch required preconfigured with the proper VLA...

Page 40: ...Installation steps SSR task This section is a high level overview of the tasks that an IBM service support representative SSR performs to complete code 20 SSR objectives 1 Log in by using the SSR ID esserv1 and run the specified essutils command options on each server in the following order a One or more ESS 5000 building blocks b Two or more ESS 5000 protocol nodes c An ESS 5000 management server...

Page 41: ... to provide a node already on the management network that can be used to test ping If you are adding additional protocol nodes to an existing configuration run only the flow Checking ESS management server and protocol nodes on page 39 Request the customer to provide a node already on the management network that can be used to test ping Mixing ESS 3000 ESS 5000 and or legacy ESS IBM service support...

Page 42: ...vers and building blocks non EMS or protocol 1 Connect the Ethernet cable to C11 T4 bottom port of the 1 10 Gb network card When the Ethernet cable is connected your laptop should automatically obtain a DHCP IP address from the server which is 10 111 222 102 Note This card has four ports that are positioned vertically The top port is connected to the management switch VLAN 2 Ping the port to ensur...

Page 43: ...iguration example 4 When you log in press Enter to start the essutils tool A sample output is as follows Note Enlarge the PuTTY window to minimum 80 x 24 otherwise essutils might not work correctly Chapter 3 Installing 31 ...

Page 44: ... Enter d In the prompt message that appears change the password to ibmesscluster and then press Enter When you are asked to enter the new password ibmesscluster again enter it and then press Enter to accept the change e Type the command exit and press Enter The essutils Advanced SSR tasks screen appears again f Highlight Back and then press Enter g Highlight SSR Tools and then press Enter The foll...

Page 45: ...on worksheet on page 45 if any You also need to write the installed version for the customer so they know whether they need to download a new update or not Note A hard stop fails a code 20 If you encounter with any issues write down those issues on the Installation worksheet on page 45 for a further reference 8 Press Enter to continue 9 Highlight the Quick Storage Configuration check option and th...

Page 46: ...he Check enclosure cabling and paths to disks option and then press Enter If the output summary that appears at the end of the screen displays ERROR see either the IBM Elastic Storage System 5000 Problem Determination Guide or the WCII or both to debug the issue Note Incorrect enclosure cabling bad disks or the enclosures in a bad state and should be power cycled can cause an error 12 Press Enter ...

Page 47: ...s ERROR or if any disks display I O errors above the given threshold value see IBM Elastic Storage System 5000 Problem Determination Guide or the WCII or both Note One or more disks require replacement might be a possible cause of an error This operation might take time depending on the size and the number of disks The number of enclosures is also a major factor 14 Press Enter to continue 15 Highl...

Page 48: ... Press Enter to check the information Note When saved the changes are displayed at the bottom of the screen as shown in the following figure A sample output is as follows d Press Enter to continue 18 If applicable highlight the Erase serviceable event option if you need to clear any resolved serviceable events and then press c to edit the field Important Do not press Enter Always press c to custom...

Page 49: ...omer provided for this server on the worksheet A default address is recommended on the 10 0 0 0 24 netmask 255 255 255 0 subnet 20 Highlight the Set FSP ipmi static state and netmask option to set the netmask and the HMC1 port to static Press Enter if you want a netmask of 255 255 255 0 24 a If you want to change the netmask press c to edit the field b Type the new netmask that you want to set c P...

Page 50: ...55 255 255 0 subnet The netmask must be in the mask format For example 24 for 255 255 255 0 b Press Ctrl G keys to save c Press Enter to set the IP 23 Highlight and click the Check the management interface option Verify that the IP is set correctly and then press Enter The management interface must be enP1p8s0f0 a Highlight Back and then Enter to exit Note You must type the command exit which will...

Page 51: ...to point to C11 T4 bottom port You can use the same commands that you use to check an I O server except two command options Check enclosure cabling and paths to disks and Check disks for IO operations You will not run these options because the EMS nor protocol nodes contain any SAS adapter or external storage like the I O servers However when you run essutils perform the following steps Note For m...

Page 52: ... click b Press c to change the target IP address c Change the target IP in the N field The IP is the management IP that was set on the data server 1 d Press Ctrl G keys to save e Press Enter to run the command f Change the target to the HMC1 IP address of the first data server g Press Enter to run the command h Repeat these steps for the data server 2 If the ping is not successful check the cablin...

Page 53: ...102 and netmask 255 255 255 252 Verify that you are plugged into the correct port C11 T4 on the EMS If the issues still persist shut down the node hold the white button until blinking again and perform the following steps 1 Connect the laptop point to point over serial connection connecting the USB to RJ45 console cable between the auxiliary laptop and the serial port of the server with active ASM...

Page 54: ... select the Rescue option If several redhat boot options besides the Rescue option are shown select the newest available Press Enter and the server will start to boot the OS You should see the output going across the screen and come to a login 6 Log in using the SSR credentials username essserv1 password serial number of server Note You must set the terminal variable correctly before attempting to...

Page 55: ...be shipped with an old default SSR IP address IP address 10 0 0 100 Netmask 255 255 255 0 Try to set that IP address and attempt ping If the ping test still does not work Ensure that the system is also booted up apply power and press the front white button Wait minimum 5 minutes though it could be up to 20 minutes to boot a data server call IBM service Assigning the management IP address This sect...

Page 56: ...lready entered do not remove it 3 To save the file press ESC run wq command and then press Enter 4 To reload the connections run the nmcli c reload command and then press Enter 5 To see the IP address set to the specified interface run the ip addr command and then press Enter You can now do a ping test on the command line or run the essutils tool to perform the task You can turn off the Wi Fi that...

Page 57: ...rows Recommendations Keep all management interfaces on 192 168 x x 24 netmask 255 255 255 0 Keep all FSP HMC1 interfaces on 10 0 0 x 24 netmask 255 255 255 0 Note The EMS has an additional FSP connection at C11 T2 which is visible to the operating system Important All IP addresses must on the same subnet For example All management interfaces on 192 168 x x 24 All FSP interfaces on 10 0 0 x 24 ESS ...

Page 58: ...ment interfaces IP address Netmask Protocol node 1 management Interface bottom most 192 168 45 40 255 255 255 0 Protocol node 2 management Interface top 192 168 45 41 255 255 255 0 FSP interfaces IP address Netmask POWER9 protocol node 1 FSP HMC1 port interface bottom most 10 0 0 105 255 255 255 0 POWER9 protocol node 2 FSP HMC1 port interface top 10 0 0 106 255 255 255 0 Note For additional POWER...

Page 59: ...0 0 100 255 255 255 0 EMS FSP C11 T2 interface 10 0 0 1 255 255 255 0 SSR task Version detected Write down the installed version that was detected when the Check and validate various install parameters option is run This version must be the same on each server If different versions were detected in the order write them down Note Write down any vital information that you encountered during the proc...

Page 60: ...48 IBM Elastic Storage System 5000 Hardware Guide ...

Page 61: ...Appendix B 5105 22E Reference information For detailed information about 5105 22E see PDF files for 5105 22E Copyright IBM Corp 2020 49 ...

Page 62: ...50 IBM Elastic Storage System 5000 Hardware Guide ...

Page 63: ...e by touch but do not activate just by touching them Industry standard devices for ports and connectors The attachment of alternative input and output devices IBM Knowledge Center and its related publications are accessibility enabled The accessibility features are described in IBM Knowledge Center www ibm com support knowledgecenter Keyboard navigation This product uses standard Microsoft Windows...

Page 64: ...52 IBM Elastic Storage System 5000 Hardware Guide ...

Page 65: ...ee central processor complex CPC central processor complex CPC A physical collection of hardware that consists of channels timers main storage and one or more central processors cluster A loosely coupled collection of independent systems or nodes organized into a network for the purpose of sharing resources and communicating with each other See also GPFS cluster cluster manager The node that monit...

Page 66: ...system composed of one or more building blocks encryption key A mathematical value that allows components to verify that they are in communication with the expected server Encryption keys are based on a public or private key pair that is created during the installation process See also file encryption key FEK master encryption key MEK ESS See Elastic Storage System ESS environmental service module...

Page 67: ... as a unit for balancing workload across a cluster See also dependent fileset independent fileset fileset snapshot A snapshot of an independent fileset plus all dependent filesets flexible service processor FSP Firmware that provides diagnosis initialization configuration runtime error detection and correction Connects to the HMC FQDN See fully qualified domain name FQDN FSP See flexible service p...

Page 68: ...uster IP See Internet Protocol IP IP over InfiniBand IPoIB Provides an IP network emulation layer on top of InfiniBand RDMA networks which allows existing applications to run over InfiniBand networks unmodified IPoIB See IP over InfiniBand IPoIB ISKLM See IBM Security Key Lifecycle Manager ISKLM J JBOD array The total collection of disks and enclosures over which a recovery group pair is defined K...

Page 69: ...n unit MTU N Network File System NFS A protocol developed by Sun Microsystems Incorporated that allows any host in a network to gain access to another host or netgroup and their file directories Network Shared Disk NSD A component for cluster wide disk naming and access NSD volume ID A unique 16 digit hexadecimal number that is used to identify and access all NSDs node An individual operating syst...

Page 70: ...tem data when a failure has occurred Recovery can involve reconstructing data or providing alternative routing through a different server recovery group RG A collection of disks that is set up by ESS in which each disk is connected physically to two servers a primary server and a backup server remote direct memory access RDMA A direct memory access from the memory of one computer into that of anot...

Page 71: ...on that results from them SSH See secure shell SSH STP See Spanning Tree Protocol STP symmetric multiprocessing SMP A computer architecture that provides fast performance by making multiple processors available to complete individual processes simultaneously T TCP See Transmission Control Protocol TCP Transmission Control Protocol TCP A core protocol of the Internet Protocol Suite that provides re...

Page 72: ...60 IBM Elastic Storage System 5000 Hardware Guide ...

Page 73: ...information overview ix inspections safety external device check 9 installation worksheet 45 N Network switch 43 notices environmental 10 O overview of information ix P preface ix R resources on web x S safety notices sound pressure 10 sound pressure safety notices 10 SSR port 43 static sensitive devices 10 submitting xi T trademarks 2 W web documentation x resources x Index 61 ...

Page 74: ...62 IBM Elastic Storage System 5000 Hardware Guide ...

Page 75: ...information overview ix inspections safety external device check 9 installation worksheet 45 N Network switch 43 notices environmental 10 O overview of information ix P preface ix R resources on web x S safety notices sound pressure 10 sound pressure safety notices 10 SSR port 43 static sensitive devices 10 submitting xi T trademarks 2 W web documentation x resources x Index 63 ...

Page 76: ...64 IBM Elastic Storage System 5000 Hardware Guide ...

Page 77: ......

Page 78: ...IBM Product Number 5765 DME 5765 DAE SC28 3155 00 ...

Reviews: