background image

IBM

 

Eserver

 

Cluster

 

1350

Installation

 

and

 

Service

 

Guide

    

E

Rserver

 

Содержание eserver Cluster 1350

Страница 1: ...IBM Eserver Cluster 1350 Installation and Service Guide ERserver...

Страница 2: ......

Страница 3: ...IBM Eserver Cluster 1350 Installation and Service Guide ERserver...

Страница 4: ...on in Safety on page vii and Appendix G Notices on page 93 Fourth edition February 2004 Copyright International Business Machines Corporation 2004 All rights reserved US Government Users Restricted Ri...

Страница 5: ...he cabinets 11 Customer responsibilities 11 Installer responsibilities 11 Chapter 4 Cabling the Cluster 1350 13 VLAN options 14 Connecting the cables 20 1 Gb Ethernet cabling 20 High speed Myrinet swi...

Страница 6: ...rk 44 Checking storage 45 Checking the terminal server 46 Troubleshooting the KVM network 47 File system failure 47 PFA alert indicates internal disk 47 I O errors in syslog 47 Isolating software prob...

Страница 7: ...CI board 67 Myrinet switch chassis 67 Configure and setup after device replacement 68 Additional information 68 Chapter 16 Configuring and replacing the Power Management Module 69 Replacing the Power...

Страница 8: ...endix F International License Agreement for Non Warranted Programs 89 Part 1 General Terms 89 Part 2 Country unique Terms 91 License Information 92 Appendix G Notices 93 Edition notice 93 Trademarks 9...

Страница 9: ...not perform any procedures until you receive a translated copy IBM does not accept responsibility or liability for failure to follow these procedures correctly Safety Information Before installing th...

Страница 10: ...k For example if a caution statement begins with a number 1 translations for that caution statement appear in the IBM NetBAY Rack Safety Information book under statement 1 Be sure to read all caution...

Страница 11: ...device at a time v The maximum allowable weight for devices on slide rails is 80 kg 176 lb Do not install sliding devices that exceed this weight Class 1 Laser Product Laser Klasse 1 Laser Klass 1 Lu...

Страница 12: ...vidence of fire water or structural damage v Disconnect the attached power cords telecommunications systems networks and modems before you open the device covers unless instructed otherwise in the ins...

Страница 13: ...l power cords are disconnected from the power source 1 2 Statement 6 CAUTION If you install a strain relief bracket option over the end of the power cord that is connected to the device you must conne...

Страница 14: ...ing any device in the rack cabinet v Install an emergency power off switch if more than one power device power distribution unit or uninterruptible power supply is installed in the same rack cabinet v...

Страница 15: ...at least 760 x 2030 MM 30 x 80 in v Ensure that all devices shelves drawers doors and cables are secure v Ensure that the four leveling pads are raised to their highest position v Ensure that there i...

Страница 16: ...Caution These statements indicate situations that can be potentially hazardous to you A caution statement is placed just before the description of a potentially hazardous procedure step or situation v...

Страница 17: ...CAT SUSE LINUX version 8 2 XCAT 32 bit Enterprise SLES version 8 XCAT for Opteron Red Hat Enterprise Linux RHEL version 3 0 and Workstation for Opteron version 3 0 XCAT 64 bit SLES version 8 for Opter...

Страница 18: ...6 5 4 3 2 1 16 15 14 13 Cluster Nodes x335 2nd 10 100 Mb Ethernet Switch Option Storage Node x345 Storage Expansion Unit EXP500 Storage Server FAStT200 20 19 18 17 Cluster Nodes x335 1U Blank 1U Blank...

Страница 19: ...97 96 91 90 89 Cabinet 3 240 239 238 237 236 235 234 233 232 231 230 229 255 254 253 252 251 250 249 248 244 243 242 241 Cabinet 7 92 Power Management Module 33 34 36 37 38 35 93 94 245 246 76 75 74 6...

Страница 20: ...9 8 7 6 5 4 3 2 1 16 15 14 13 Storage Nodes x345 36 30 29 28 27 Port Servers 32 33 34 31 Storage Servers FAStT700 37 38 39 40 41 42 Storage Servers FAStT700 Storage Expansion Units EXP700 Power Manag...

Страница 21: ...er contains one management node which provides system management for all modules in the cluster The Cluster 1350 management node is typically an xSeries 345 server running Linux You can also use an Es...

Страница 22: ...upports the following SCSI RAID storage controller adapters v A ServeRAID 6I Ultra320 SCSI controller supports up to 16 arrays with support for a maximum of 160 hard disk drives v A ServeRAID 6M Ultra...

Страница 23: ...5 slot M3 E64 9 slot M3 E128 17 slot M3F PC164C 2 PCI adapter and M3F PCIXD 2 PCI card The high speed switch can replace the optional secondary Ethernet switch It requires a Myrinet PCI adapter in ea...

Страница 24: ...accessible To turn off power to the cabinet you must disconnect all the PDU power cords from the electrical outlets or from the individual PDU inlets Related publications Your cluster might have featu...

Страница 25: ...placed for the cluster identify the primary cabinet and verify its contents If equipment is removed prior to shipping check the bill of materials to make sure that all the equipment that is required...

Страница 26: ...10 IBM Eserver Cluster 1350 Installation and Service Guide...

Страница 27: ...cations Each cabinet has installation labels to help you in this process The IBM support team determines the final cabinet placement and completes the cabling and installation steps Installer responsi...

Страница 28: ...cabinet placement complete the following steps 1 Inspect the cabinets components and cable connections for shipping damage 2 Install the frame stabilizer foot on each cabinet The following illustratio...

Страница 29: ...N to manage the components in the cluster This VLAN includes the following connections v RS 485 connections to all cluster nodes and storage nodes through the Remote Supervisor Adapters These enable d...

Страница 30: ...Cluster 1350 supports a variety of VLAN options There are six basic configurations Point to point wiring information is printed on each cable Check the information on the cables in the primary rack a...

Страница 31: ...0x Ethernet 1 connects to Cisco 400x FAStT600 Connects to Cisco 400x Uses both jacks FAStT700 Connects to Cisco 400x Uses both jacks FAStT900 Connects to Cisco 400x Uses both jacks Table 4 Type 3 VLAN...

Страница 32: ...Gbit public high speed VLAN Device Management VLAN 10 100 primary cluster VLAN Myrinet customer public high speed VLAN Management node Ethernet 2 connects to Cisco 3550 Ethernet 1 connects to Cisco 3...

Страница 33: ...Supervisor III uplink 1 connects to Fibre Channel PCI adapter Supervisor III uplink 2 connects to public network Cluster nodes Ethernet 0 connects to Cisco 400x Ethernet 2 connects to Cisco 400x Stora...

Страница 34: ...ts to Cisco 3550 In Reach LX 4000 32 port 48 port terminal server Connects to Cisco 3550 3508 Gbit switch Connects to 3550 copper GBIC APC switch Connects to Cisco 3550 Cisco 3550 10 100 switch Cisco...

Страница 35: ...Cisco 400x Ethernet 1 Alias connects to Cisco 400x Ethernet 2 connects to Cisco 400x FAStT700 both jacks Connects to Cisco 3550 Table 10 Type 6 VLAN with multiple Cisco 400x switches Device Managemen...

Страница 36: ...t and expansion cabinets connect the cables that run between the cabinets This is called the intercabinet cabling and the following types of cables are involved v 1 Gb or 2 Gb Fibre Channel optical v...

Страница 37: ...t the management node and all the storage nodes each require a separate KVM switch port Certain systems might require a second KVM switch Install the second switch in the expansion cabinet that contai...

Страница 38: ...ters to cut off the connectors at both ends of the defective cable This prevents someone from mistakenly reconnecting the cable thinking that it has inadvertently been left unconnected 3 Install a sin...

Страница 39: ...ng at the base of the cabinet c Connect the power cable to the electrical outlet d Turn on the power breaker switch for the source power e Make sure that the power distribution unit circuit breakers a...

Страница 40: ...edure for every expansion cabinet unit in the cluster before powering on the primary cabinet Turning on the power to the primary cabinet Complete the following steps to turn on the primary cabinet 1 S...

Страница 41: ...dware configuration To make sure that you install the cluster components correctly run LCITto generate a new set of tab files Compare the new tab files to the tab files that come with your cluster to...

Страница 42: ...on to the last known state On Off If the last known state is On then the nodes start and display a login prompt 4 Log files show system restart events on nodes and on Remote Supervisor Adapter If a li...

Страница 43: ...ter 1350 requires certain levels of a supported Linux version and Cluster System Management CSM software Before you begin the software installation process make sure that you have collected all the ap...

Страница 44: ...te Supervisor Adapte RSA firmware v1 06 v RSA2 firmware v1 03 v RSA2 video BIOS YI002519 00 v RSA2 video v3 0 v ISMP v1 06 xSeries 360 v Flash BIOS update v1 11 v Diagnostics v3 01 v Remote Supervisor...

Страница 45: ...d versions of Linux Use the detailed installation instructions that come with your software kit to install the Linux software If you do not have your documentation for installing Linux go to http www...

Страница 46: ...e etc modules conf file to put the host adapters in the correct order and to add the parameter scsi_mod max_scsi_luns to the file Important v Because the system is running a modular kernel the Adaptec...

Страница 47: ...start and run the setup diskette or CD to configure the network Assign the same configuration information for the Remote Supervisor Adapter name IP address host name as used before Go to the following...

Страница 48: ...l Parallel File System for Linux R Concepts Planning and Installation Guide and search for file system manager The host name or IP address must refer to the communications adapter Alias interfaces are...

Страница 49: ...e multinode quorum algorithm Distributing the system image to all nodes in the cluster Because of the way the Red Hat version 9 0 loads SCSI drivers and assigns them to dev sda dev sdb partitions prob...

Страница 50: ...Log on to the storage nodes and verify the disk configuration fdisk l 3 If a modem is present configure the modem according to the instructions 34 IBM Eserver Cluster 1350 Installation and Service Gui...

Страница 51: ...tion Each rack in the configuration includes one or two terminal servers to connect each node in the rack through a DB9 to RH45 serial cable The terminal servers are LAN connected to the Management VL...

Страница 52: ...system is up and running typical applications v A lights out or brown out event occurs The system shuts down then restarts through an external source v All nodes turn on to the last known state On Off...

Страница 53: ...secure traffic for hardware control The management Ethernet VLAN is used for management traffic only It is logically isolated for security using the VLAN capability of the Cisco Ethernet switches and...

Страница 54: ...ter network A Cluster 1350 can also have a second network either an additional Ethernet network or a Myrinet 2000 network As a preliminary diagnostic step ping all the nodes over all available network...

Страница 55: ...ork adapter on the management node v DHCP configuration v Network configuration v Cisco blade failure Table 13 Network troubleshooting for a cluster with one network Symptom Action Cannot ping a node...

Страница 56: ...hat fails to function in the network 3 To determine the IP Address scheme of each node at the console prompt type ifconfig and compare this output to the factory defaults shown in Table 14 Table 14 Fa...

Страница 57: ...are problems on page 42 for problem resolution Table 15 Network troubleshooting for a cluster with two networks Symptom Action Cannot ping a node or nodes on the cluster network from the management no...

Страница 58: ...settings of all suspect ports against ports that are working 4 Make sure that the terminal server is turned on and connected to the network by pinging the unit at the IP address of 172 30 20 1 5 To m...

Страница 59: ...The service processor log might be full The log is cleared by connecting to the service processor through the Remote Supervisor Adapter card Otherwise go to Node checks on page 42 for node problem res...

Страница 60: ...link between the RSA card and the Cisco 3550 or 400x switch 8 Flash the ASM service processors to the latest firmware level 9 Flash the RSA to the latest firmware level 10 Check RSA configurations usi...

Страница 61: ...he file system problem resolution v If fdisk l reports missing disks check that the adapter device driver is configured If the adapter device driver is configured go to Checking storage and continue w...

Страница 62: ...ter device replacement on page 59 then at the IN Reach_Priv prompt type show port portnumber to compare the settings of all suspect ports against ports that work 4 Make sure that the In Reach terminal...

Страница 63: ...ough cable to direct connect or bypass possible bad cables 6 Reboot the failing node to reset connection to the KVM switch File system failure Use the following information to resolve file system fail...

Страница 64: ...tion process Differences in node lists Output from the command CT_CONTACT ManagedNodeName lsrsrc IBM Host FileSystem when run on the management node is not the same as when run on the managed node Thi...

Страница 65: ...man readable form and added to the syslog Use the lsnode Al command to determine the host name for the Remote Supervisor Adapter card and the service processor name associated with the failing node Us...

Страница 66: ...and issue the power off and power on commands to the RSA port Checking service processor logs At the console prompt type lsnode Al to determine the host name for the Remote Supervisor Adapter RSA car...

Страница 67: ...ctions Disk drive failure on a cluster node Use the following troubleshooting information about disk drive failures on the cluster node The xSeries 335 supports hot swapping of hard disks but BladeCen...

Страница 68: ...settings v Devices and I O Ports PORT 3F8 IRQ4 v Remote Console Redirection Enabled COM1 9600 8 None 1 VT100 Enabled v Boot sequence Diskette Drive CD ROM Network Hard Drive 0 Boot Fail Count DISABLE...

Страница 69: ...he customer breaker panel and make sure they are on 3 Measure the voltage on the power out side of the Frame Power Block If no voltage is present have the customer s electrician check for power issues...

Страница 70: ...formation about new features or technical updates might be available to provide additional information that is not included with your cluster These updates are available from the IBM Web site Complete...

Страница 71: ...selection you will be returned to the Network Configuration menu 6 Select option 2 and specify if you are using a static or BootP IP address Use a static IP address for ease of configuration If you ar...

Страница 72: ...you must extract the IP address using Windows operating system tools 6 Right click on Network Neighborhood and select Properties 7 Click the Protocols tab and select TCP IP protocol 8 Select Propertie...

Страница 73: ...press Enter The device settings are now saved to NVRAM Connecting components with the KVM switch power turned on You can connect additional servers to the KVM switch while the system is running When...

Страница 74: ...ing to take effect and set low power mode for monitors so configured Resetting the mouse and keyboard If the mouse and keyboard are not working properly for example no cursor response you may need to...

Страница 75: ...In Reach command prompt type set priv and press Enter 8 At the Password command prompt type system 9 At the In Reach command prompt type show ip to see the current network settings 10 To set the IP a...

Страница 76: ...ast command will cause the terminal server to save any configuration changes and restart The terminal server should now be fully operational For more information about the In Reach LX 4000 terminal se...

Страница 77: ...ow Control v VT100 Emulation 4 At the command prompt in the terminal emulation window type enable This puts you in administrative mode 5 At the command prompt type ibm and press Enter The prompt will...

Страница 78: ...and the switch If a ping to the switch fails make sure that the IP address and gateway address to make sure the subnet and gateway addresses match v On the PC at the command prompt type ipconfig v On...

Страница 79: ...ve mode 5 At the prompt type ibm and press B The prompt will change from a to a to indicate you are in administrative mode 6 Type show run to show the current configuration information Make note of th...

Страница 80: ...This ping should succeed v Connect node1 to VLAN1 and node2 to VLAN2 and ping node2 from node1 This ping should fail Additional information Catalyst 5000 Family Ethernet and Fast Ethernet Switching M...

Страница 81: ...al duct and how it fits within the cabinet and attaches to the switch Item Part No Qty Description 1 24P7877 2 Bracket Cisco 2 24P7878 1 Rail right 3 24P7879 1 Cover 4 24P7885 1 Rail duct 5 1410 42L 1...

Страница 82: ...66 IBM Eserver Cluster 1350 Installation and Service Guide...

Страница 83: ...e Myrinet traffic polling the ports and building tables to control the addressing of messages Blower module Cools the Myrinet Switch Chassis All of these components can be hot swapped The Myrinet docu...

Страница 84: ...re use 8 Connect the power cord to the Myrinet switch This powers up the switch Configure and setup after device replacement The Myrinet switch automatically remaps all the PCI boards so no manual con...

Страница 85: ...load at http apcc com tools download 5 To reinstall power bricks for the Remote Supervisor Adapter RSA cards see the applicable documentation that came with your power bricks and RSA card Related topi...

Страница 86: ...70 IBM Eserver Cluster 1350 Installation and Service Guide...

Страница 87: ...up to twelve rack PDUs To remove the Power Distribution Units perform the following steps 1 Shut down all devices 2 Remove the side cover on the side of the rack that the failing PDU is located on 3 T...

Страница 88: ...72 IBM Eserver Cluster 1350 Installation and Service Guide...

Страница 89: ...ee below Be sure to have the following information available when you call Machine type 1410 Model 42L Serial number v The label containing the serial number can be found on the purchase order or in t...

Страница 90: ...74 IBM Eserver Cluster 1350 Installation and Service Guide...

Страница 91: ...Q Why doesn t the xSeries 345 boot PXE correctly A You cannot have a PCI ethernet card that uses the e1000 driver in the xSeries 345 when installing Take the card out and retry the installation Q Why...

Страница 92: ...Issue the installnode command and then on the management node immediately edit the tftpboot pxelinux cfg AC files Take out console portion from the APPEND line Now all messages will go to the KVM cons...

Страница 93: ...a POST code and description of the error For example 301 Keyboard Input Error 164 Memory size has changed Cluster System Management log Cluster System Management CSM log files can be viewed in the var...

Страница 94: ...78 IBM Eserver Cluster 1350 Installation and Service Guide...

Страница 95: ...jumper from port A to port B on cluster nodes CSM Stale NFS mounts Existing NFS mounted file systems are inaccessible after a CSM installation on a cluster node 1 Remount the NFS file systems 2 If th...

Страница 96: ...s eth0 e1000 alias scsi_hostadapter aic7xxx alias scsi_hostadapter1 ips alias eth1 e1000 alias eth1 e1000 alias parport_lowlevel parport_pc alias scsi_hostadapter3 aic7xxx options scsi_mod max_scsi_lu...

Страница 97: ...Service Processor If a name is not recognized make sure that there are no trailing blanks after the name Light path points to PCI LED If Light Path diagnostics points to PCI LED reseat the PCI boards...

Страница 98: ...82 IBM Eserver Cluster 1350 Installation and Service Guide...

Страница 99: ...LAN The sc0 port must be assigned to the Management VLAN Again one port assigned to the Management VLAN needs to be reserved to make the connection to the switch itself Load balancing across EtherChan...

Страница 100: ...port basis type show spanning tree brief Switch commands for the Cisco Gigabit 4006 switch running IOS The following commands also work with the Cisco 3550 Ethernet switch running IOS To set up VLANs...

Страница 101: ...range mode port port switchport host end To see the VLAN setup type show vlan To set the switch as the spanning tree protocol root Run the command once for each VLAN conf t spanning tree id root prim...

Страница 102: ...nd storage nodes type set port host To see the VLAN setup type show vlan To set the switch as the spanning tree protocol primary type this command once for each VLAN set spantree root vlanid To set th...

Страница 103: ...locked by the spanning tree protocol type show spantree Miscellaneous Cisco switch commands for IOS To view the ports that are blocked by the spanning tree protocol type show sp br Appendix E Configur...

Страница 104: ...88 IBM Eserver Cluster 1350 Installation and Service Guide...

Страница 105: ...have acquired and 2 make and install copies to support the level of use authorized providing you reproduce the copyright notice and any other legends of ownership on each copy or partial copy of the P...

Страница 106: ...ITATION MAY NOT APPLY TO YOU 6 General Nothing in this Agreement affects any statutory rights of consumers that cannot be waived or limited by contract IBM may terminate your license if you fail to co...

Страница 107: ...ction 6 The following replaces the fourth paragraph of this Section If no suit or other legal action is brought within two years after the cause of action arose in respect of any claim that either par...

Страница 108: ...e Cisco Software must obtain from Cisco or a Cisco reseller including IBM a new license to use the Cisco Software 2 In addition to the warranty disclaimers provided in Point 4 of the ILA Cisco disclai...

Страница 109: ...RANTY OF ANY KIND EITHER EXPRESS OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF NON INFRINGEMENT MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE Some states do not allow disclai...

Страница 110: ...ountries or both Microsoft Windows and Windows NT are trademarks of Microsoft Corporation in the United States other countries or both UNIX is a registered trademark of The Open Group in the United St...

Страница 111: ...bility gaskets and connectors which may contain lead and copper beryllium alloys that require special handling and disposal at end of life Before this unit is disposed of these materials must be remov...

Страница 112: ...terference and 2 this device must accept any interference received including interference that may cause undesired operation Industry Canada Class A emission compliance statement This Class A digital...

Страница 113: ...ensed communication equipment Attention This is a Class A product In a domestic environment this product may cause radio interference in which case the user may be required to take adequate measures T...

Страница 114: ...98 IBM Eserver Cluster 1350 Installation and Service Guide...

Страница 115: ...Cisco 4000 Series switch installation 65 removal 65 replacement 65 troubleshooting 65 Cisco Catalyst 4003 high speed switch 7 Cisco Catalyst 4006 high speed switch 7 Cisco Catalyst high speed switch c...

Страница 116: ...xpansion unit description 6 expansion cabinets turning on 24 F FAQ 75 FAStT600 storage controller 6 FAStT700 storage controller 6 FAStT900 storage controller 6 FCC Class A notice 96 Fibre Channel cabl...

Страница 117: ...orage node configuration 30 logs error 77 logs continued event 77 M M3 E128 model high speed Myrinet switch 7 M3 E32 model high speed Myrinet switch 7 M3 E64 model high speed Myrinet switch 7 M3F PCIX...

Страница 118: ...ive 48 resetting RSA cards 50 setting up SNMP alerts 49 SNMP monitoring 49 software 37 problems power 53 procedure lights out or brownout 26 pushing the image nodes 33 R RCM cabling 21 Red Hat Linux s...

Страница 119: ...troubleshooting 52 system image copying 33 system overview 1 T terminal server 59 cluster components 7 description 7 testing configuration 33 trademarks 94 troubleshooting BladeCenter problems 52 Cis...

Страница 120: ...104 IBM Eserver Cluster 1350 Installation and Service Guide...

Страница 121: ......

Страница 122: ...Part Number 25K8407 Printed in USA 1P P N 25K8407...

Отзывы: