background image

4.

   

Disconnect

 

the

 

power

 

cord

 

from

 

the

 

Myrinet

 

switch.

 

This

 

powers

 

down

 

the

 

switch.

 

5.

   

Remove

 

the

 

rack-mount

 

screws

 

from

 

the

 

chassis;

 

then

 

remove

 

the

 

chassis

 

from

 

the

 

rack.

 

6.

   

Install

 

the

 

new

 

chassis

 

and

 

fasten

 

the

 

rack-mount

 

screws.

 

7.

   

Connect

 

the

 

optical

 

cables

 

to

 

the

 

connectors

 

on

 

the

 

switch.

 

Note:

  

Save

 

the

 

dust

 

caps

 

for

 

future

 

use.

 

8.

   

Connect

 

the

 

power

 

cord

 

to

 

the

 

Myrinet

 

switch.

 

This

 

powers

 

up

 

the

 

switch.

Configure

 

and

 

setup

 

after

 

device

 

replacement

 

The

 

Myrinet

 

switch

 

automatically

 

remaps

 

all

 

the

 

PCI

 

boards,

 

so

 

no

 

manual

 

configuration

 

is

 

needed.

 

IBM

 

Customer

 

Support

 

personnel

 

will

 

update

 

the

 

firmware

 

if

 

necessary.

 

Additional

 

information

 

Additional

 

installation

 

and

 

troubleshooting

 

information

 

is

 

available

 

online

 

from

 

Myricom

 

at

 

the

 

following

 

URL:

 

http://www.myri.com/scs/#documentation

   

68

 

IBM

 

Eserver

 

Cluster

 

1350

 

Installation

 

and

 

Service

 

Guide

Summary of Contents for eserver Cluster 1350

Page 1: ...IBM Eserver Cluster 1350 Installation and Service Guide ERserver...

Page 2: ......

Page 3: ...IBM Eserver Cluster 1350 Installation and Service Guide ERserver...

Page 4: ...on in Safety on page vii and Appendix G Notices on page 93 Fourth edition February 2004 Copyright International Business Machines Corporation 2004 All rights reserved US Government Users Restricted Ri...

Page 5: ...he cabinets 11 Customer responsibilities 11 Installer responsibilities 11 Chapter 4 Cabling the Cluster 1350 13 VLAN options 14 Connecting the cables 20 1 Gb Ethernet cabling 20 High speed Myrinet swi...

Page 6: ...rk 44 Checking storage 45 Checking the terminal server 46 Troubleshooting the KVM network 47 File system failure 47 PFA alert indicates internal disk 47 I O errors in syslog 47 Isolating software prob...

Page 7: ...CI board 67 Myrinet switch chassis 67 Configure and setup after device replacement 68 Additional information 68 Chapter 16 Configuring and replacing the Power Management Module 69 Replacing the Power...

Page 8: ...endix F International License Agreement for Non Warranted Programs 89 Part 1 General Terms 89 Part 2 Country unique Terms 91 License Information 92 Appendix G Notices 93 Edition notice 93 Trademarks 9...

Page 9: ...not perform any procedures until you receive a translated copy IBM does not accept responsibility or liability for failure to follow these procedures correctly Safety Information Before installing th...

Page 10: ...k For example if a caution statement begins with a number 1 translations for that caution statement appear in the IBM NetBAY Rack Safety Information book under statement 1 Be sure to read all caution...

Page 11: ...device at a time v The maximum allowable weight for devices on slide rails is 80 kg 176 lb Do not install sliding devices that exceed this weight Class 1 Laser Product Laser Klasse 1 Laser Klass 1 Lu...

Page 12: ...vidence of fire water or structural damage v Disconnect the attached power cords telecommunications systems networks and modems before you open the device covers unless instructed otherwise in the ins...

Page 13: ...l power cords are disconnected from the power source 1 2 Statement 6 CAUTION If you install a strain relief bracket option over the end of the power cord that is connected to the device you must conne...

Page 14: ...ing any device in the rack cabinet v Install an emergency power off switch if more than one power device power distribution unit or uninterruptible power supply is installed in the same rack cabinet v...

Page 15: ...at least 760 x 2030 MM 30 x 80 in v Ensure that all devices shelves drawers doors and cables are secure v Ensure that the four leveling pads are raised to their highest position v Ensure that there i...

Page 16: ...Caution These statements indicate situations that can be potentially hazardous to you A caution statement is placed just before the description of a potentially hazardous procedure step or situation v...

Page 17: ...CAT SUSE LINUX version 8 2 XCAT 32 bit Enterprise SLES version 8 XCAT for Opteron Red Hat Enterprise Linux RHEL version 3 0 and Workstation for Opteron version 3 0 XCAT 64 bit SLES version 8 for Opter...

Page 18: ...6 5 4 3 2 1 16 15 14 13 Cluster Nodes x335 2nd 10 100 Mb Ethernet Switch Option Storage Node x345 Storage Expansion Unit EXP500 Storage Server FAStT200 20 19 18 17 Cluster Nodes x335 1U Blank 1U Blank...

Page 19: ...97 96 91 90 89 Cabinet 3 240 239 238 237 236 235 234 233 232 231 230 229 255 254 253 252 251 250 249 248 244 243 242 241 Cabinet 7 92 Power Management Module 33 34 36 37 38 35 93 94 245 246 76 75 74 6...

Page 20: ...9 8 7 6 5 4 3 2 1 16 15 14 13 Storage Nodes x345 36 30 29 28 27 Port Servers 32 33 34 31 Storage Servers FAStT700 37 38 39 40 41 42 Storage Servers FAStT700 Storage Expansion Units EXP700 Power Manag...

Page 21: ...er contains one management node which provides system management for all modules in the cluster The Cluster 1350 management node is typically an xSeries 345 server running Linux You can also use an Es...

Page 22: ...upports the following SCSI RAID storage controller adapters v A ServeRAID 6I Ultra320 SCSI controller supports up to 16 arrays with support for a maximum of 160 hard disk drives v A ServeRAID 6M Ultra...

Page 23: ...5 slot M3 E64 9 slot M3 E128 17 slot M3F PC164C 2 PCI adapter and M3F PCIXD 2 PCI card The high speed switch can replace the optional secondary Ethernet switch It requires a Myrinet PCI adapter in ea...

Page 24: ...accessible To turn off power to the cabinet you must disconnect all the PDU power cords from the electrical outlets or from the individual PDU inlets Related publications Your cluster might have featu...

Page 25: ...placed for the cluster identify the primary cabinet and verify its contents If equipment is removed prior to shipping check the bill of materials to make sure that all the equipment that is required...

Page 26: ...10 IBM Eserver Cluster 1350 Installation and Service Guide...

Page 27: ...cations Each cabinet has installation labels to help you in this process The IBM support team determines the final cabinet placement and completes the cabling and installation steps Installer responsi...

Page 28: ...cabinet placement complete the following steps 1 Inspect the cabinets components and cable connections for shipping damage 2 Install the frame stabilizer foot on each cabinet The following illustratio...

Page 29: ...N to manage the components in the cluster This VLAN includes the following connections v RS 485 connections to all cluster nodes and storage nodes through the Remote Supervisor Adapters These enable d...

Page 30: ...Cluster 1350 supports a variety of VLAN options There are six basic configurations Point to point wiring information is printed on each cable Check the information on the cables in the primary rack a...

Page 31: ...0x Ethernet 1 connects to Cisco 400x FAStT600 Connects to Cisco 400x Uses both jacks FAStT700 Connects to Cisco 400x Uses both jacks FAStT900 Connects to Cisco 400x Uses both jacks Table 4 Type 3 VLAN...

Page 32: ...Gbit public high speed VLAN Device Management VLAN 10 100 primary cluster VLAN Myrinet customer public high speed VLAN Management node Ethernet 2 connects to Cisco 3550 Ethernet 1 connects to Cisco 3...

Page 33: ...Supervisor III uplink 1 connects to Fibre Channel PCI adapter Supervisor III uplink 2 connects to public network Cluster nodes Ethernet 0 connects to Cisco 400x Ethernet 2 connects to Cisco 400x Stora...

Page 34: ...ts to Cisco 3550 In Reach LX 4000 32 port 48 port terminal server Connects to Cisco 3550 3508 Gbit switch Connects to 3550 copper GBIC APC switch Connects to Cisco 3550 Cisco 3550 10 100 switch Cisco...

Page 35: ...Cisco 400x Ethernet 1 Alias connects to Cisco 400x Ethernet 2 connects to Cisco 400x FAStT700 both jacks Connects to Cisco 3550 Table 10 Type 6 VLAN with multiple Cisco 400x switches Device Managemen...

Page 36: ...t and expansion cabinets connect the cables that run between the cabinets This is called the intercabinet cabling and the following types of cables are involved v 1 Gb or 2 Gb Fibre Channel optical v...

Page 37: ...t the management node and all the storage nodes each require a separate KVM switch port Certain systems might require a second KVM switch Install the second switch in the expansion cabinet that contai...

Page 38: ...ters to cut off the connectors at both ends of the defective cable This prevents someone from mistakenly reconnecting the cable thinking that it has inadvertently been left unconnected 3 Install a sin...

Page 39: ...ng at the base of the cabinet c Connect the power cable to the electrical outlet d Turn on the power breaker switch for the source power e Make sure that the power distribution unit circuit breakers a...

Page 40: ...edure for every expansion cabinet unit in the cluster before powering on the primary cabinet Turning on the power to the primary cabinet Complete the following steps to turn on the primary cabinet 1 S...

Page 41: ...dware configuration To make sure that you install the cluster components correctly run LCITto generate a new set of tab files Compare the new tab files to the tab files that come with your cluster to...

Page 42: ...on to the last known state On Off If the last known state is On then the nodes start and display a login prompt 4 Log files show system restart events on nodes and on Remote Supervisor Adapter If a li...

Page 43: ...ter 1350 requires certain levels of a supported Linux version and Cluster System Management CSM software Before you begin the software installation process make sure that you have collected all the ap...

Page 44: ...te Supervisor Adapte RSA firmware v1 06 v RSA2 firmware v1 03 v RSA2 video BIOS YI002519 00 v RSA2 video v3 0 v ISMP v1 06 xSeries 360 v Flash BIOS update v1 11 v Diagnostics v3 01 v Remote Supervisor...

Page 45: ...d versions of Linux Use the detailed installation instructions that come with your software kit to install the Linux software If you do not have your documentation for installing Linux go to http www...

Page 46: ...e etc modules conf file to put the host adapters in the correct order and to add the parameter scsi_mod max_scsi_luns to the file Important v Because the system is running a modular kernel the Adaptec...

Page 47: ...start and run the setup diskette or CD to configure the network Assign the same configuration information for the Remote Supervisor Adapter name IP address host name as used before Go to the following...

Page 48: ...l Parallel File System for Linux R Concepts Planning and Installation Guide and search for file system manager The host name or IP address must refer to the communications adapter Alias interfaces are...

Page 49: ...e multinode quorum algorithm Distributing the system image to all nodes in the cluster Because of the way the Red Hat version 9 0 loads SCSI drivers and assigns them to dev sda dev sdb partitions prob...

Page 50: ...Log on to the storage nodes and verify the disk configuration fdisk l 3 If a modem is present configure the modem according to the instructions 34 IBM Eserver Cluster 1350 Installation and Service Gui...

Page 51: ...tion Each rack in the configuration includes one or two terminal servers to connect each node in the rack through a DB9 to RH45 serial cable The terminal servers are LAN connected to the Management VL...

Page 52: ...system is up and running typical applications v A lights out or brown out event occurs The system shuts down then restarts through an external source v All nodes turn on to the last known state On Off...

Page 53: ...secure traffic for hardware control The management Ethernet VLAN is used for management traffic only It is logically isolated for security using the VLAN capability of the Cisco Ethernet switches and...

Page 54: ...ter network A Cluster 1350 can also have a second network either an additional Ethernet network or a Myrinet 2000 network As a preliminary diagnostic step ping all the nodes over all available network...

Page 55: ...ork adapter on the management node v DHCP configuration v Network configuration v Cisco blade failure Table 13 Network troubleshooting for a cluster with one network Symptom Action Cannot ping a node...

Page 56: ...hat fails to function in the network 3 To determine the IP Address scheme of each node at the console prompt type ifconfig and compare this output to the factory defaults shown in Table 14 Table 14 Fa...

Page 57: ...are problems on page 42 for problem resolution Table 15 Network troubleshooting for a cluster with two networks Symptom Action Cannot ping a node or nodes on the cluster network from the management no...

Page 58: ...settings of all suspect ports against ports that are working 4 Make sure that the terminal server is turned on and connected to the network by pinging the unit at the IP address of 172 30 20 1 5 To m...

Page 59: ...The service processor log might be full The log is cleared by connecting to the service processor through the Remote Supervisor Adapter card Otherwise go to Node checks on page 42 for node problem res...

Page 60: ...link between the RSA card and the Cisco 3550 or 400x switch 8 Flash the ASM service processors to the latest firmware level 9 Flash the RSA to the latest firmware level 10 Check RSA configurations usi...

Page 61: ...he file system problem resolution v If fdisk l reports missing disks check that the adapter device driver is configured If the adapter device driver is configured go to Checking storage and continue w...

Page 62: ...ter device replacement on page 59 then at the IN Reach_Priv prompt type show port portnumber to compare the settings of all suspect ports against ports that work 4 Make sure that the In Reach terminal...

Page 63: ...ough cable to direct connect or bypass possible bad cables 6 Reboot the failing node to reset connection to the KVM switch File system failure Use the following information to resolve file system fail...

Page 64: ...tion process Differences in node lists Output from the command CT_CONTACT ManagedNodeName lsrsrc IBM Host FileSystem when run on the management node is not the same as when run on the managed node Thi...

Page 65: ...man readable form and added to the syslog Use the lsnode Al command to determine the host name for the Remote Supervisor Adapter card and the service processor name associated with the failing node Us...

Page 66: ...and issue the power off and power on commands to the RSA port Checking service processor logs At the console prompt type lsnode Al to determine the host name for the Remote Supervisor Adapter RSA car...

Page 67: ...ctions Disk drive failure on a cluster node Use the following troubleshooting information about disk drive failures on the cluster node The xSeries 335 supports hot swapping of hard disks but BladeCen...

Page 68: ...settings v Devices and I O Ports PORT 3F8 IRQ4 v Remote Console Redirection Enabled COM1 9600 8 None 1 VT100 Enabled v Boot sequence Diskette Drive CD ROM Network Hard Drive 0 Boot Fail Count DISABLE...

Page 69: ...he customer breaker panel and make sure they are on 3 Measure the voltage on the power out side of the Frame Power Block If no voltage is present have the customer s electrician check for power issues...

Page 70: ...formation about new features or technical updates might be available to provide additional information that is not included with your cluster These updates are available from the IBM Web site Complete...

Page 71: ...selection you will be returned to the Network Configuration menu 6 Select option 2 and specify if you are using a static or BootP IP address Use a static IP address for ease of configuration If you ar...

Page 72: ...you must extract the IP address using Windows operating system tools 6 Right click on Network Neighborhood and select Properties 7 Click the Protocols tab and select TCP IP protocol 8 Select Propertie...

Page 73: ...press Enter The device settings are now saved to NVRAM Connecting components with the KVM switch power turned on You can connect additional servers to the KVM switch while the system is running When...

Page 74: ...ing to take effect and set low power mode for monitors so configured Resetting the mouse and keyboard If the mouse and keyboard are not working properly for example no cursor response you may need to...

Page 75: ...In Reach command prompt type set priv and press Enter 8 At the Password command prompt type system 9 At the In Reach command prompt type show ip to see the current network settings 10 To set the IP a...

Page 76: ...ast command will cause the terminal server to save any configuration changes and restart The terminal server should now be fully operational For more information about the In Reach LX 4000 terminal se...

Page 77: ...ow Control v VT100 Emulation 4 At the command prompt in the terminal emulation window type enable This puts you in administrative mode 5 At the command prompt type ibm and press Enter The prompt will...

Page 78: ...and the switch If a ping to the switch fails make sure that the IP address and gateway address to make sure the subnet and gateway addresses match v On the PC at the command prompt type ipconfig v On...

Page 79: ...ve mode 5 At the prompt type ibm and press B The prompt will change from a to a to indicate you are in administrative mode 6 Type show run to show the current configuration information Make note of th...

Page 80: ...This ping should succeed v Connect node1 to VLAN1 and node2 to VLAN2 and ping node2 from node1 This ping should fail Additional information Catalyst 5000 Family Ethernet and Fast Ethernet Switching M...

Page 81: ...al duct and how it fits within the cabinet and attaches to the switch Item Part No Qty Description 1 24P7877 2 Bracket Cisco 2 24P7878 1 Rail right 3 24P7879 1 Cover 4 24P7885 1 Rail duct 5 1410 42L 1...

Page 82: ...66 IBM Eserver Cluster 1350 Installation and Service Guide...

Page 83: ...e Myrinet traffic polling the ports and building tables to control the addressing of messages Blower module Cools the Myrinet Switch Chassis All of these components can be hot swapped The Myrinet docu...

Page 84: ...re use 8 Connect the power cord to the Myrinet switch This powers up the switch Configure and setup after device replacement The Myrinet switch automatically remaps all the PCI boards so no manual con...

Page 85: ...load at http apcc com tools download 5 To reinstall power bricks for the Remote Supervisor Adapter RSA cards see the applicable documentation that came with your power bricks and RSA card Related topi...

Page 86: ...70 IBM Eserver Cluster 1350 Installation and Service Guide...

Page 87: ...up to twelve rack PDUs To remove the Power Distribution Units perform the following steps 1 Shut down all devices 2 Remove the side cover on the side of the rack that the failing PDU is located on 3 T...

Page 88: ...72 IBM Eserver Cluster 1350 Installation and Service Guide...

Page 89: ...ee below Be sure to have the following information available when you call Machine type 1410 Model 42L Serial number v The label containing the serial number can be found on the purchase order or in t...

Page 90: ...74 IBM Eserver Cluster 1350 Installation and Service Guide...

Page 91: ...Q Why doesn t the xSeries 345 boot PXE correctly A You cannot have a PCI ethernet card that uses the e1000 driver in the xSeries 345 when installing Take the card out and retry the installation Q Why...

Page 92: ...Issue the installnode command and then on the management node immediately edit the tftpboot pxelinux cfg AC files Take out console portion from the APPEND line Now all messages will go to the KVM cons...

Page 93: ...a POST code and description of the error For example 301 Keyboard Input Error 164 Memory size has changed Cluster System Management log Cluster System Management CSM log files can be viewed in the var...

Page 94: ...78 IBM Eserver Cluster 1350 Installation and Service Guide...

Page 95: ...jumper from port A to port B on cluster nodes CSM Stale NFS mounts Existing NFS mounted file systems are inaccessible after a CSM installation on a cluster node 1 Remount the NFS file systems 2 If th...

Page 96: ...s eth0 e1000 alias scsi_hostadapter aic7xxx alias scsi_hostadapter1 ips alias eth1 e1000 alias eth1 e1000 alias parport_lowlevel parport_pc alias scsi_hostadapter3 aic7xxx options scsi_mod max_scsi_lu...

Page 97: ...Service Processor If a name is not recognized make sure that there are no trailing blanks after the name Light path points to PCI LED If Light Path diagnostics points to PCI LED reseat the PCI boards...

Page 98: ...82 IBM Eserver Cluster 1350 Installation and Service Guide...

Page 99: ...LAN The sc0 port must be assigned to the Management VLAN Again one port assigned to the Management VLAN needs to be reserved to make the connection to the switch itself Load balancing across EtherChan...

Page 100: ...port basis type show spanning tree brief Switch commands for the Cisco Gigabit 4006 switch running IOS The following commands also work with the Cisco 3550 Ethernet switch running IOS To set up VLANs...

Page 101: ...range mode port port switchport host end To see the VLAN setup type show vlan To set the switch as the spanning tree protocol root Run the command once for each VLAN conf t spanning tree id root prim...

Page 102: ...nd storage nodes type set port host To see the VLAN setup type show vlan To set the switch as the spanning tree protocol primary type this command once for each VLAN set spantree root vlanid To set th...

Page 103: ...locked by the spanning tree protocol type show spantree Miscellaneous Cisco switch commands for IOS To view the ports that are blocked by the spanning tree protocol type show sp br Appendix E Configur...

Page 104: ...88 IBM Eserver Cluster 1350 Installation and Service Guide...

Page 105: ...have acquired and 2 make and install copies to support the level of use authorized providing you reproduce the copyright notice and any other legends of ownership on each copy or partial copy of the P...

Page 106: ...ITATION MAY NOT APPLY TO YOU 6 General Nothing in this Agreement affects any statutory rights of consumers that cannot be waived or limited by contract IBM may terminate your license if you fail to co...

Page 107: ...ction 6 The following replaces the fourth paragraph of this Section If no suit or other legal action is brought within two years after the cause of action arose in respect of any claim that either par...

Page 108: ...e Cisco Software must obtain from Cisco or a Cisco reseller including IBM a new license to use the Cisco Software 2 In addition to the warranty disclaimers provided in Point 4 of the ILA Cisco disclai...

Page 109: ...RANTY OF ANY KIND EITHER EXPRESS OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF NON INFRINGEMENT MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE Some states do not allow disclai...

Page 110: ...ountries or both Microsoft Windows and Windows NT are trademarks of Microsoft Corporation in the United States other countries or both UNIX is a registered trademark of The Open Group in the United St...

Page 111: ...bility gaskets and connectors which may contain lead and copper beryllium alloys that require special handling and disposal at end of life Before this unit is disposed of these materials must be remov...

Page 112: ...terference and 2 this device must accept any interference received including interference that may cause undesired operation Industry Canada Class A emission compliance statement This Class A digital...

Page 113: ...ensed communication equipment Attention This is a Class A product In a domestic environment this product may cause radio interference in which case the user may be required to take adequate measures T...

Page 114: ...98 IBM Eserver Cluster 1350 Installation and Service Guide...

Page 115: ...Cisco 4000 Series switch installation 65 removal 65 replacement 65 troubleshooting 65 Cisco Catalyst 4003 high speed switch 7 Cisco Catalyst 4006 high speed switch 7 Cisco Catalyst high speed switch c...

Page 116: ...xpansion unit description 6 expansion cabinets turning on 24 F FAQ 75 FAStT600 storage controller 6 FAStT700 storage controller 6 FAStT900 storage controller 6 FCC Class A notice 96 Fibre Channel cabl...

Page 117: ...orage node configuration 30 logs error 77 logs continued event 77 M M3 E128 model high speed Myrinet switch 7 M3 E32 model high speed Myrinet switch 7 M3 E64 model high speed Myrinet switch 7 M3F PCIX...

Page 118: ...ive 48 resetting RSA cards 50 setting up SNMP alerts 49 SNMP monitoring 49 software 37 problems power 53 procedure lights out or brownout 26 pushing the image nodes 33 R RCM cabling 21 Red Hat Linux s...

Page 119: ...troubleshooting 52 system image copying 33 system overview 1 T terminal server 59 cluster components 7 description 7 testing configuration 33 trademarks 94 troubleshooting BladeCenter problems 52 Cis...

Page 120: ...104 IBM Eserver Cluster 1350 Installation and Service Guide...

Page 121: ......

Page 122: ...Part Number 25K8407 Printed in USA 1P P N 25K8407...

Reviews: