background image

Sun HPC 3.0 SCI Guide

901 San Antonio Road

Palo Alto, , CA 94303-4900

USA 650 960-1300 Fax 650 969-9131

Part No: 805-6263-10

June 1999, Revision A

Содержание Sun HPC 3.0

Страница 1: ...Sun HPC 3 0 SCI Guide 901 San Antonio Road Palo Alto CA 94303 4900 USA 650 960 1300 Fax 650 969 9131 Part No 805 6263 10 June 1999 Revision A...

Страница 2: ...Sun Microsystems Inc 901 San Antonio Road Palo Alto Californie 94303 4900 U S A Tous droits r serv s Ce produit ou document est prot g par un copyright et distribu avec des licences qui en restreigne...

Страница 3: ...Scrubber Jumpers 7 2 Network Connection Procedure 11 Install SCI Adapter Cards 11 Notes for Scrubber Jumper Settings 11 Notes for Switched Two Node Network 12 Notes Regarding SBus Slots 12 Connecting...

Страница 4: ...e 29 Add Variable to sci conf File 29 Reboot Nodes 29 4 Verify That the Network Is Functional 31 Run get_ci_status 31 Run ifconfig a 31 Ping the SCI Adapter Cards 31 Do All to All Ping 32 Check for RS...

Страница 5: ...I Switch Status LED Locations 37 Port Status LEDs 38 General Switch Status LED 39 The get_ci_status Command 39 Client Net Failure 40 Incorrect Software Configuration 40 Incorrect Firmware 41 A Man Pag...

Страница 6: ...vi Sun HPC 3 0 SCI Guide June 1999 Revision A...

Страница 7: ...I network in the various supported topologies Chapter 3 explains how to configure the network interfaces on the cluster nodes Chapter 4 provides a set of procedures that can be used to check the basic...

Страница 8: ...ur system Typographic Conventions TABLE P 1 Typographic Conventions Typeface or Symbol Meaning Examples AaBbCc123 The names of commands files and directories on screen computer output Edit your login...

Страница 9: ...ated Documentation Application Title Part Number All Sun HPC ClusterTools TM 3 0 Product Notes 805 6262 10 Sun MPI Programming Sun MPI 4 0 User s Guide With LSF 805 7230 10 Sun MPI Programming Sun MPI...

Страница 10: ...TABLE P 3 Related Documentation continued Application Title Part Number LSF LSF Batch User s Guide 805 6258 10 LSF LSF Batch Programmer s Guide 805 6260 10 x Sun HPC 3 0 SCI Guide June 1999 Revision A...

Страница 11: ...before installing and configuring an SCI network on your Sun HPC cluster SCI Adapter Cards Sun HPC 3 0 cluster nodes connect to the SCI network through SCI adapter cards installed in the node s SBus s...

Страница 12: ...Figure 1 1 Basic SCI Network Connection Schemes for Sun HPC 3 0 Clusters 2 Sun HPC 3 0 SCI Guide June 1999 Revision A...

Страница 13: ...to support message striping Chapter 2 describes the procedure for connecting the nodes in the various network topologies described below Chapter 3 describes the procedure for configuring the SCI driv...

Страница 14: ...a third node later on The chief disadvantage to using a switch in a two node network is the latency it adds to the communication path between the nodes This alternate connection scheme is discussed f...

Страница 15: ...pported Three Node SCI Interconnections Four Node Networks Figure 1 4 shows examples of how four Sun HPC nodes can be connected to an SCI networ in both unstriped and striped modes Preparing for SCI I...

Страница 16: ...Figure 1 4 Supported Four Node SCI Interconnections 6 Sun HPC 3 0 SCI Guide June 1999 Revision A...

Страница 17: ...ation of the Scrubber Jumper Table 1 1 specifies the appropriate scrubber jumper settings for unswitched and switched SCI networks TABLE 1 1 Scrubber Jumper Settings Topology SCI SBus Card Jumper Sett...

Страница 18: ...Figure 1 6 Examples of Scrubber Jumper Settings in Two Node Networks 8 Sun HPC 3 0 SCI Guide June 1999 Revision A...

Страница 19: ...ault setting Therefore examine the setting on each SCI adapter card and adjust it if necessary If scrubber jumpers are not set correctly when installed communication between nodes may experience inter...

Страница 20: ...10 Sun HPC 3 0 SCI Guide June 1999 Revision A...

Страница 21: ...scrubber jumper has the correct setting for the network configuration in which it will be used 4 Switchless two node network If you are creating a two node network in which the nodes will connect dir...

Страница 22: ...rks Note The chief disadvantage to using a switch when it is not required is that the switch adds some latency that would otherwise not be in the network Notes Regarding SBus Slots If the SCI adapter...

Страница 23: ...of your nodes has two SCI adapter cards use two SCI cables one for each pair of adapter cards See Figure 2 1 for examples 3 Connect the node power cords to the appropriate power outlets 4 Turn the nod...

Страница 24: ...dapter card to a port on an SCI switch You can connect any SCI adapter card to any switch port but you should follow a logical order in making your connections This will simplify network configuration...

Страница 25: ...Figure 2 2 Examples of Three Node Switched SCI Connections Network Connection Procedure 15...

Страница 26: ...Figure 2 3 Examples of Four Node SCI Switched Connections 16 Sun HPC 3 0 SCI Guide June 1999 Revision A...

Страница 27: ...m_config to initialize the SCI network interface See Propagate the SCI Configuration on page 28 4 Verifying the rank of the SCI interfaces Verify the Rank of the SCI Interface on page 29 4 Rebooting t...

Страница 28: ...ftware the installer responded No when asked if SCI packages should be installed If they are not present they must be installed The installation GUI graphical user interface includes an option for ins...

Страница 29: ...r It is usually a four or five digit number Figure 3 1 shows an example of a temporary map of a four node configuration without striping that is with one SCI network adapter connection per node This i...

Страница 30: ...SCI switch Each node is connected to the switch by a single station cable Again message striping is not supported 4 sma2 2stripes hpc Two nodes connected directly by two SCI station cables There is n...

Страница 31: ...all of the nodes in the cluster by replacing host_namen placeholders with the host names of the cluster s nodes For example if your cluster contains the nodes node3 node4 node5 and node6 Section 2 sho...

Страница 32: ...pology you implement as follows 4 Two node cluster nonstriped set Number of Direct Links in cluster 1 4 Two node cluster striped set Number of Direct Links in cluster 2 4 Three or four node cluster ei...

Страница 33: ...describing unswitched connections is host n adp n is connected to link n endpt n When no switch is used an adapter adp is connected to a particular endpoint endpt n on a particular channel link n See...

Страница 34: ...describing switched connections is slightly different host n adp n is connected to switch n port n Here an adapter is connected to port n of switch n Figure 3 4 through Figure 3 7 show examples of th...

Страница 35: ...Figure 3 4 Three Node Nonstriped Configuration Figure 3 5 Three Node Striped Configuration Configuring the SCI Network Interface 25...

Страница 36: ...re 3 7 Four Node Striped Configuration Adapter ID values are assigned automatically by the device driver Initially the device driver assigns ID 0 to the adapter installed in the lowest numbered SBus s...

Страница 37: ...at is the contents of Section 7 to match the actual adapter connections This is why you were advised to make the temporary map of the physical network layout Instructions for ensuring that the sci_con...

Страница 38: ...Adp 0 serial no 6148 bus slot 0 Press Return to continue Do not press Return yet Instead go to the next section Compare sm_config Output With Contents of sci_config hpc Compare the list of serial numb...

Страница 39: ...o 1 If you don t know the hpc conf file s location do one of the following 4 LSF If your cluster is running LSF open the LSF file etc lsf conf The LSF_CONFDIR entry in lsf conf identifies the director...

Страница 40: ...30 Sun HPC 3 0 SCI Guide June 1999 Revision A...

Страница 41: ...Execute get_ci_status on all cluster nodes to verify interconnectivity Run ifconfig a Execute ifconfig a to verify that all the nodes are up with the SCI daemons running Ping the SCI Adapter Cards Pi...

Страница 42: ...ifies that the RSM layer is functional Verify Basic Communication Functionality Chapter 5 of the Sun HPC ClusterTools 3 0 Installation Guide describes procedures for verifying that the cluster can suc...

Страница 43: ...er Card Turn the node s power switch off and disconnect it from the power outlet Note that this only needs to be done on the node receiving the new adapter card Check the scrubber jumper setting on th...

Страница 44: ...last changed or create a new temporary network map as described in Chapter 3 Include the new adapter card in the map identifying its SBus location and serial number Run sciconf Run the command scicon...

Страница 45: ...o incorporate the information provided by the sciconf output Run sm_config Execute the sm_config command as root on the node that contains the sci_config hpc file If possible do this from a console te...

Страница 46: ...e stop execution of sm_config press Control C and correct the configuration file Then run sm_config again and compare its output with sci_config hpc again When the contents of the sci_config hpc file...

Страница 47: ...CI cables are properly seated 4 All SCI switches have power applied 4 No SCI status LEDs are red see Table 6 1 and Table 6 2 SCI Switch Status LED Locations Clusters with three or four nodes can be co...

Страница 48: ...6 1 SCI Switch Port Status LEDs Situation Port LED Status No power All four LEDs not lit Fatal switch errors fatal hardware error temperature to high fan s not operative power supply problem All four...

Страница 49: ...e 6 2 if the get_ci_status command is used on interconn1 a typical output would be opt SUNWsma bin get_ci_status sma sci 0 sbus_slot 1 adapter_id 8 0x08 ip_address 1 switch_id 0 port_id 0 Adapter Stat...

Страница 50: ...adapter the cable or SCI switch 0 port 0 is faulty Note that some aspects of the get_ci_status command output such as host names will vary according to your configuration Client Net Failure System co...

Страница 51: ...not saved in the message file 1 One SCI card is working in the node rebooting Resetting DOLPHIN SBus to SCI SBus2b Adapter 9029 Serial 5017 FCode 9029 Revision 2 3 d9029_52 Date 1996 10 30 07 47 53 E...

Страница 52: ...Run opt SUNWsci bin sciadm and enter the identify command This command displays the firmware version fcode version and serial number of each adapter board found 4 Compare the number of cards found by...

Страница 53: ...utility for clusters SYNOPSIS sm_config t f filename AVAILABILITY SUNWsma INTERFACE CLASSIFICATION Sun Private DESCRIPTION sm_config is the SCI adapter configuration utility It acts as a client of sm...

Страница 54: ...sections 1 Cluster configuration section specifies the type of cluster being configured PDB or HPC A sample template for this section Cluster is configured as PDB 2 Host names section requires the nam...

Страница 55: ...its final form in future new communication channels SMA sessions will be created on the new links say through a new switch on the fly This eliminates having to run sm_config later and rebooting the ma...

Страница 56: ...be run on any host in the cluster but it should not be run on multiple hosts simultaneosly eg via cconsole If this occurs the results are unpredict able in the worst case the adapter flash memory migh...

Страница 57: ...ter status and the SMA session status It queries the SCI driver for information about the local SCI adapters and tests the connectivity to SCI adapters on other hosts either via a switch or a direct l...

Страница 58: ...erational sma Switch_id 1 sma port_id 1 host_name interconn2 adapter_id 76 active inoperational sma port_id 2 host_name interconn3 adapter_id 140 inactive operational sma port_id 3 host_name interconn...

Страница 59: ...SAGE get_ci_status can be run from the command line by any user However it can only be run after the adapter cards have been initialized using sm_config 1M This ensures that all the adapter node ids h...

Страница 60: ...50 Sun HPC 3 0 SCI Guide June 1999 Revision A...

Страница 61: ...entering stand alone mode 4 SUNWcluster sma smad 1102 smad Cluster 4 SUNWcluster sma smad 1103 smad Cluster 4 SUNWcluster sma smad 1104 smad Cluster 4 SUNWcluster sma smad 1105 smad Cluster 4 SUNWclus...

Страница 62: ...dog 1002 fix msgstr Not Applicable SUNWcluster sma watchdog 2001 child pid exit died status msgid SUNWcluster sma watchdog 2001 message msgstr The SMAD child daemon is dead If necessary another SMAD c...

Страница 63: ...d adapter is working msgid SUNWcluster sma smak 1051 error msgstr Not Applicable msgid SUNWcluster sma smak 1051 fix msgstr Not Applicable SUNWcluster sma smak 4001 SCI Adapter adp Card not operationa...

Страница 64: ...logical adapter msgid SUNWcluster sma smad 5010 error msgstr This adapter was acting as the logical adapter and it is no longer the logical adapter If recovery happens some other adapter will be chose...

Страница 65: ...ster sma smad 1104 error msgstr Not Available msgid SUNWcluster sma smad 1104 fix msgstr Not Applicable SUNWcluster sma smad 1105 smad Cluster clustname no longer running msgid SUNWcluster sma smad 11...

Страница 66: ...adp whose SCI id is from_aid to the SCI adapter with SCI id to_aid has been closed msgid SUNWcluster sma smak 3003 error msgstr This is probably because of a failure or a shutdown of the remote node...

Страница 67: ...Continuation msgstr Not Available msgid SUNWcluster sma smactl 4008 fix msgstr Not Applicable SMA Messages 57...

Отзывы: