background image

SGI

®

 Altix

® 

XE1300 Cluster

 

Quick Reference Guide

007-4979-004

Содержание Altix XE1300

Страница 1: ...SGI Altix XE1300 Cluster Quick Reference Guide 007 4979 004 ...

Страница 2: ...d for Department of Defense units b 48 CFR 227 7202 of the DoD FAR Supplement or sections succeeding thereto Contractor manufacturer is Silicon Graphics Inc 1140 East Arques Avenue Sunnyvale CA 94085 4602 TRADEMARKS AND ATTRIBUTIONS Silicon Graphics SGI Altix and the SGI logo are registered trademarks of SGI in the United States and or other countries worldwide Voltaire is a registered trademark o...

Страница 3: ...ril 2007 Updated Scali Manage information to version 5 4 003 December 2007 Updates of Scali Manage information to version 5 5 and NIC1 IP address change process re written 004 March 2008 Updates for Altix XE250 head nodes and XE250 or XE320 compute nodes plus Scali Manage version 5 6 information ...

Страница 4: ......

Страница 5: ...NIC Guidelines 14 Cluster Management Head Node IP Addresses 14 Changing the NIC1 Customer Domain IP Address 15 Cluster Compute Node IP Addresses 17 Switch Connect and IP Address 17 Web or Telnet Access to Maintenance Port on the Gigabit Ethernet Switch 18 Web or Telnet Access to the Compute Traffic Gigabit Ethernet Switch 18 Serial Access to the SMC Switch 19 InfiniBand Switch Connect and IP Addre...

Страница 6: ...38 Network BMC Configuration 39 Select Preferred Operating System 40 Node Network Configuration Screen 41 DNS and NTP Configuration Screen 43 NIS Configuration Screen 44 Scali Manage Options Screen 45 Configuration Setup Complete Screen 46 Checking the Log File Entries Optional 47 Setting a Node Failure Alarm on Scali Manage 48 3 IPMI Commands Overview 53 User Administration 54 Typical ipmitool Co...

Страница 7: ... commands 56 Displaying all Objects in SDR 56 Displaying all Sensors in the System 56 Displaying an Individual Sensor 56 Chassis Commands 57 Chassis Identify 57 Controlling System Power 57 Changing System Boot Order 57 SEL Commands 57 ...

Страница 8: ......

Страница 9: ... XE320 servers as compute nodes and SGI Altix XE250 servers as administrative head nodes Consult with your SGI support representative before swapping nodes between pre existing and newer clusters The XE1300 cluster is a distributed memory system as opposed to a shared memory system like that used in the SGI Altix 450 or SGI Altix 4700 high performance compute servers Instead of passing pointers in...

Страница 10: ... wait for the sub processes to finish For large clusters or clusters that run many MPI jobs multiple head nodes may be used to distribute the load The compute nodes are identical computing systems that run the primary processes of MPI applications These compute nodes are connected to each other through the interconnect network The network interconnect components are typically Gigabit Ethernet or I...

Страница 11: ... document P N 007 4944 00x shipped with your cluster system describes the inter rack cable connections If you have arranged for SGI field personnel to install the system rack s contact your service representative After your cluster rack s are installed refer back to this guide to continue working with your SGI cluster system Booting the XE1300 Cluster Power on any mass storage units attached to yo...

Страница 12: ...or the ambient room temperature being too warm Power Fail Indicates power is being supplied to the system s power supply unit This LED should normally be illuminated when the system is operating SGI Altix XE240 Head Node Front Controls and Indicators AF000030 L J K H I B A F G E D C Figure 1 2 SGI Altix XE240 Head Node Controls and Indicators Table 1 1 SGI Altix XE240 Head Node Controls and Indica...

Страница 13: ...dentification LED Solid blue indicates system identification is active No light indicates system identification is not active H System Identification Button LED Press this button once to activate the System Identification LED Press the button again to de activate the System Identification LED Solid blue indicates system identification is active No light indicates system identification is not activ...

Страница 14: ...led by that control panel Pressing this button removes the main power but keeps standby power supplied to the node board Overheat Fan fail When the Overheat Fan Fail LED flashes it indicates that a fan has failed When the Overheat Fan Fail LED is on continuously it indicates that an overheat condition has occurred which may be caused by cables obstructing the airflow in the system or the ambient r...

Страница 15: ...thernet configuration using a single Ethernet switch for node to node communication Figure 1 5 on page 9 illustrates a dual switch cluster configuration with one switch handling MPI traffic and the other used for basic cluster administration and communication Figure 1 6 on page 10 is an example configuration using one Ethernet switch for general administration and one InfiniBand switch for MPI tra...

Страница 16: ...rnet Base Gigabit Ethernet switch for Admin Standard RJ 45 twisted pair cable Head Node Compute Node Compute Node Compute Node 8 007 4979 004 1 SGI Altix XE1300 Cluster Quick reference Figure 1 4 Basic Cluster Configuration Example Using a Single Ethernet Switch ...

Страница 17: ...Base Gigabit Ethernet switch for Admin Standard RJ 45 twisted pair cable Head Node Compute Node Compute Node Compute Node GigE PCI card Base Gigabit Ethernet switch MPI Cluster Configuration Overview 007 4979 004 9 Figure 1 5 Dual Ethernet Switch Based Cluster Example ...

Страница 18: ...rkstation monitor Standard RJ 45 twisted pair cable InfiniBand cables Customer Ethernet Head Node Compute Node Compute Node Compute Node InfiniBand PCI card 10 007 4979 004 1 SGI Altix XE1300 Cluster Quick reference Figure 1 6 Single Ethernet and Single InfiniBand Switch Configuration Example ...

Страница 19: ... 45 twisted pair cable InfiniBand cables Customer Ethernet Gigabit Ethernet switch for NAS NAS Head Node Compute Node Compute Node Compute Node Standard RJ 45 twisted pair cable InfiniBand PCI card Cluster Configuration Overview 007 4979 004 11 Figure 1 7 Dual Ethernet Plus Infiniband Switch Cluster Configuration Example ...

Страница 20: ...t scali sbin power command to manage the system power H Usage opt scali sbin power option nodelist on off cycle status Example Use the following command to power cycle cluster nodes 001 through 032 power cl1n 001 032 cycle If your cluster uses the Scali Manage administrative software release 5 x x you can power off specific nodes or the entire system using the graphical user interface Select Manag...

Страница 21: ...the following command init 0 2 Press the power button on the head node s that you want to power off You may have to hold the button down for up to 5 seconds You may power off the nodes in any order 3 To power off the compute nodes press the power button for up to 5 seconds on the front panel of each unit refer to Figure 1 2 on page 4 4 To power off optional storage units in the cluster rack press ...

Страница 22: ...ry IP address setting This setting needs to be changed to reflect the customer domain IP address before connection to the LAN Refer to the section Changing the NIC1 Customer Domain IP Address on page 15 On board network interface 2 nic2 10 0 10 1 is always used as the management and administration internal network port on the primary head node of the cluster Note In the case of a Gigabit Ethernet ...

Страница 23: ...ote A README file covering this process is also available in usr local Factory Install Scripts 1 Open the Scali manage GUI using the command scalimanage gui 2 Login with password sgisgi A Scali Manager screen appears 3 Right click on the IP Networks icon and select Create New Subnet 4 Enter the new subnet information and click the Create New Subnet box lower right then click OK to confirm the chan...

Страница 24: ...ade Click on the Network Interfaces tab Click Save lower right Click OK to confirm Click Apply Changes when prompted to Update configuration files now Wait for the node configuration task to complete You may see some errors with the Scali manage GUI If this occurs you can troubleshoot the problem by bringing up a Terminal window and running the following commands etc init d scance restart You may ...

Страница 25: ... address nic1 Infiniband IP address Gigabit Ethernet solution nic2 Baseboard Management BMC or IPMI address nic1 Compute node1 10 0 1 1 192 168 1 1 172 16 1 1 10 0 40 1 Compute node2 10 0 1 2 192 168 1 2 172 16 1 2 10 0 40 2 Compute node3 10 0 1 3 192 168 1 3 172 16 1 3 10 0 40 3 Compute node4 10 0 1 4 192 168 1 4 172 16 1 4 10 0 40 4 Note The management internal cluster administration port IP add...

Страница 26: ...istration network using telnet To access the switch via telnet telnet 10 0 20 1 Login as the administrator login admin passwd admin Web access would be http 10 0 20 1 Web or Telnet Access to the Compute Traffic Gigabit Ethernet Switch The SMC Gigabit Ethernet switch is configured with the IP address shown below when used with a NAS SAN option or message passing interface MPI traffic The switch can...

Страница 27: ...47 48 Console Pwr RPS Diag Stack Master TigerStack II 10 100 1000 8848M Module Stack Link SMC8848M Port status LEDs 10 100 1000 Mbps RJ 45 ports Stack ID Console port System indicators SFP slots Figure 1 8 SMC Switch Connectors Example 1 Establish a command line interface CLI and list the port connection settings Port Settings Bits Per Second 19200 Data bits 8 Parity None Stop Bits 1 Flow Control ...

Страница 28: ...nd Switch Your InfiniBand switch s setup is configured in the factory before shipment and should be accessible via telnet or a web browser Note There might be only one managed InfiniBand switch when multiple InfiniBand switches are used in blocking configurations To access the managed InfiniBand switch via telnet telnet 10 0 21 1 Login as the administrator login admin passwd 123456 Web access woul...

Страница 29: ...e use of the web or telnet access procedure is recommended Note For Voltaire switches 96 ports or larger always use a DB 9 serial cable To interface with the switch use the connected laptop or other PC to 1 List the port connection settings Default settings are Port Settings Bits Per Second 38400 Data bits 8 Parity None Stop Bits 1 Flow Control xon xoff 2 Click ok if the settings are acceptable In...

Страница 30: ...ast interface show This command lists the IP address 5 Power cycle the switch by disconnecting its power cable from the power connector and then plug it back in Using the 1U Console Option The SGI optional 1U console is a rackmountable unit that includes a built in keyboard touchpad and uses a 17 inch 43 cm LCD flat panel display of up to 1280x1024 pixels The 1U console attaches to the headnode us...

Страница 31: ...ectory Each compute node mounts this exported filesystem on cluster This can be used as a mechanism to install software across the cluster as well Customers with support contracts needing BIOS or Firmware updates should check the SGI Supportfolio Web Page at https support sgi com login Accessing BIOS Information BIOS Setup Utility options are used to change server configuration defaults You can ru...

Страница 32: ...um invalid Warning CMOS time and date not set Under these circumstances you should contact your SGI service representative Refer to the SGI Altix XE250 User s Guide P N 007 5467 00x SGI Altix XE320 User s Guide P N 007 5466 00x SGI Altix XE240 User s Guide P N 007 4873 00x SGI Altix XE310 User s Guide P N 007 4960 00x for specific information about BIOS settings Scali Manage Troubleshooting Tips T...

Страница 33: ...es mount head node data1 on cluster You need to execute the following commands to export a filesystem via NFS from the head node scalimanage cli addnfsexport head_node filesystem etc init d scance restart To import this filesystem on a particular compute node scalimanage cli addremotefs compute_node nfs head_node filesystem mount_point scalimanage cli reconfigure compute_node If the compute nodes ...

Страница 34: ...x XE250 head node s within the Altix XE1300 cluster SGI Altix XE320 User s Guide P N 007 5466 00x This guide covers general operation configuration and servicing of the SGI Altix XE320 compute modules within the SGI Altix XE1300 cluster SGI Altix XE310 User s Guide P N 007 4960 00x This guide covers general operation configuration and servicing of the SGI Altix XE310 compute modules within the SGI...

Страница 35: ...w to connect the adapter to an Ethernet network and explains how to operate the adapter The manual also provides information on how to performance tune this high speed interface card SGI ProPack 5 for Linux Start Here Publication Number 007 4837 00x This document provides information about the SGI ProPack 5 for Linux release including the major features of the release flowcharts of disk partitions...

Страница 36: ... ISR 9024S D Installation Manual Publication Number 399Z00002 Release AAA CAA This manual covers unpacking installation configuration and power up information as well as basic troubleshooting information for the 24 port InfiniBand Switch Routers Voltaire ISR 9288 ISR 9096 Installation Manual Publication Number 399Z40000 Release AAA AAB This manual covers unpacking installation configuration and po...

Страница 37: ...d determine that a replacement part will be needed please contact your SGI service representative using the information in Contacting the SGI Customer Service Center Return postage information is included with replacement parts Removal and replacement of the hardware components that make up the head and compute nodes within the cluster are fully documented in SGI Altix XE250 User s Guide P N 007 5...

Страница 38: ...h America 1 800 800 7441 Latin America 55 11 5185 2860 Europe 44 118 912 7500 Japan 81 3 5488 1811 Asia Pacific 1 650 933 3000 Cluster Administration Training from SGI SGI offers customer training classes covering all current systems including clusters If you have a maintenance agreement in place with SGI contact SGI Customer Education at 1 800 361 2621 for information on the time location and cos...

Страница 39: ... node to the cluster using the following sections and accompanying screen snaps Start the Scali Manage GUI on page 34 Head Node Information Screen on page 35 Adding a Node Starting from the Main GUI Screen on page 36 Adding a Cluster Compute Node on page 37 Selecting the Server Type on page 38 Network BMC Configuration on page 39 Select Preferred Operating System on page 40 Node Network Configurat...

Страница 40: ...ltix XE240 head nodes run via the Scali Manage head node ipmitool I lanplus o intelplus H ip address command The ipmitool command syntax for SGI Altix XE250 head nodes and SGI Altix XE310 and XE320 compute nodes run via the Scali Manage head node ipmitool I lanplus o supermicro H ip address command SGI Altix XE systems that run SLES10 release 4 can use the following service checkconfig ipmi on etc...

Страница 41: ...ter integration and many files and scripts that may be helpful including Under usr local Factory Install Apps Scali ibhost Intel compilers MPI runtime libraries ipmitool etc Factory Install ISO CD ISO images of the base OS for installing Scali Cluster Manage software Factory Install Docs Cluster documentation manuals Scali PBS Professional Voltaire SMC SGI Factory Install Firmware Voltaire HCA and...

Страница 42: ...ing a Node Start the Scali Manage GUI Login to the Scali Manage interface as root the factory password is sgisgi Use your system name and log in as root Refer to Figure 2 1 for an example Figure 2 1 Example Starting Screen for the Scali Manage GUI ...

Страница 43: ...e Information Screen You can view and confirm the head node information from the main GUI screen Click on the node icon cl1n001 in the example below for name and subnet information on your cluster head node Figure 2 2 Head Node Information Screen Example ...

Страница 44: ...to upgrade To add a cluster node open the Clusters tree by clicking the right mouse button Move your cursor over the cluster tree cluster cl1 in the example screen and click the right mouse button Then click the left mouse button on New in the popup window Refer to Figure 2 3 Figure 2 3 Scali Manage Main Screen Selections Example ...

Страница 45: ... the cluster needs to be upgraded or re created Select the option Extend existing cluster and provide the number of new servers 2 in the example Then select the Cluster Name cl1 in the example Select the server template and click Next to move to the following screen Figure 2 4 New Cluster Node Selection Example ...

Страница 46: ...ing the Server Type Click on Edit to bring up the Node Hardware Configuration network panel Scroll down the menu and select the server type you are adding Then enter the BMC user ID admin and the password admin Figure 2 5 Node Server Type Selection Screen Example ...

Страница 47: ...k BMC Configuration Click on the Edit button Assign the new BMC IP address stepping and BMC host name Click OK when the appropriate information is entered Click Next to move to the following screen Figure 2 6 BMC Network Configuration Screen Example ...

Страница 48: ...red Operating System Click on the option to select the new node s operating system Enter the sgisgi factory password or whatever new password may have been assigned Click Next to move to the following screen Figure 2 7 Preferred Operating System Screen Selection Example ...

Страница 49: ...1 Node Network Configuration Screen Use this screen to assign Ethernet 0 eth0 as your network interface port Fill in the additional information as it applies to your local network Click OK to continue Figure 2 8 Node Network Ethernet 0 Screen Example ...

Страница 50: ...42 007 4979 004 2 Administrative Tips and Adding a Node Enter the default gateway information refer to Figure 2 9 and select Next to continue Figure 2 9 Default Gateway Example Screen ...

Страница 51: ... NTP Configuration Screen This screen extracts the name server numbers for use with the system configuration files In this example the domain name is engr sgi com with NTP enabled Click Next when complete Figure 2 10 DNS and NTP Configuration Screen Example ...

Страница 52: ...figuration Screen This screen allows you to specify enable or disable a Network Information Service NIS for the new node Assign your domain name see Figure 2 11 for an example and click Next to go to the following screen Figure 2 11 NIS Configuration Screen Example ...

Страница 53: ...04 45 Scali Manage Options Screen This screen provides the options shown including installation of MPI your software version monitor options and more Click Next to move to the following screen Figure 2 12 Scali Manage Options Screen Example ...

Страница 54: ...onfiguration Setup Complete Screen This screen allows you to install the operating system and Scali Manage immediately or store the configuration for later use Click Finish after you make your selection Figure 2 13 Configuration Setup Complete Screen Example ...

Страница 55: ...007 4979 004 47 Checking the Log File Entries Optional You can check the log file entries during configuration of the new node s to confirm that a log file has been created and to view the entries Figure 2 14 Optional Log File Screen Example ...

Страница 56: ...UI Refer to Start the Scali Manage GUI on page 34 if needed 2 Using the mouse select the Edit Alarms submenu from the Monitoring menu item 3 Select a node or list of nodes for which you want to define the alarm 4 Then select Add Alarm to add the alarm 5 A popup appears offering input for the alarm name and an optional description refer to Figure 2 15 Figure 2 15 Alarm Description Popup Example ...

Страница 57: ...is time you must enter the criteria that trigger the alarm Click on Add Criteria refer to Figure 2 16 Figure 2 16 Add Criteria Screen Example 7 Another popup presents itself For this example we picked a Filter criteria for the node status See the example in Figure 2 17 ...

Страница 58: ...t this alarm to be triggered at most once Therefore we leave the Re Trigger value with 0 To enable this alarm click on Apply Alarm refer to Figure 2 18 on page 51 An alternative would be to define a re trigger interval in seconds by providing the amount of seconds for Re Trigger This alarm does not define any action to be taken when the alarm fires This can be easily done by selecting a predefined...

Страница 59: ...Node Failure Alarm on Scali Manage 007 4979 004 51 email to a system administrator or e mail alias You must pick the appropriate action and supply the e mail address Figure 2 18 Applying the Alarm Example Screen ...

Страница 60: ...rought down the node A few seconds thereafter the GUI indicates a node failure by changing the node icon in the cluster tree refer to Figure 2 19 A few seconds later the alarm gets triggered and shows up in the alarm log refer to Figure 2 20 Figure 2 19 Node Failure Icon Example Screen Figure 2 20 Node Down Alarm Screen Example ...

Страница 61: ...o oemtype Select OEM type to support Note Use o intelplus for an SGI Altix XE240 head node Use o supermicro for the SGI Altix XE250 headnode or the SGI Altix XE310 or XE320 compute nodes Use o list to see a list of current supported OEM types open OpenIPMI driver default lan LAN connection remote connection requires H U P arguments lanplus LANplus connection IPMI 2 0 Requires H U P arguments be su...

Страница 62: ...ddress U admin P admin Adding a User to the BMC ipmitool opts user set name user ID username ipmitool opts user set password user id password ipmitool opts user enable user id Configuring a NIC Display a current LAN Configuration ipmitool opts lan print 1 Configure a Static IP Address Static IP addresses are already set in the factory on LAN channel 1 of each node Refer to Table 1 3 on page 15 and...

Страница 63: ...er send threshold 50 1 impitool opts sol set character accumulate level 004 1 impitool opts sol set retry interval 20 1 impitool opts sol set retry count 6 1 impitool opts sol set non volatile bit rate 115 2 Note Some systems were set to a 115 2 baud rate To see your configuration enter the following impitool opts sol info SGI recommends the following parameter settings for the SGI Altix XE310 or ...

Страница 64: ...ession ipmitool opts sol deactivate Sensor commands Sensor commands may be used to display objects individual sensors or all sensors in a system Displaying all Objects in SDR ipmitool opts sdr list Ipmitool opts sdr dump filename Dump SDR contents to a file Displaying all Sensors in the System ipmitool opts sensor list Displaying an Individual Sensor ipmitool opts sensor get Temp Changing sensor t...

Страница 65: ...ontrolling System Power ipmitool opts chassis power status ipmitool opts chassis power off ipmitool opts chassis power on ipmitool opts chassis power cycle ipmitool opts chassis power soft Performs safe OS shutdown Changing System Boot Order ipmitool opts chassis bootdev pxe ipmitool opts chassis bootdev harddisk ipmitool opts chassis bootdev cdrom SEL Commands The following command displays the d...

Страница 66: ......

Отзывы: