background image

D000006-000 Rev. A

Page i

Q

S i m p l i f y

Fast Fabric

Users Guide

Summary of Contents for Fast Fabric

Page 1: ...D000006 000 Rev A Page i Q S i m p l i f y Fast Fabric Users Guide...

Page 2: ...Fast Fabric Users Guide Q Page ii D000006 000 Rev A...

Page 3: ...table for the specified use without further testing or modification QLogic Corporation assumes no responsibility for any errors that may appear in this document No part of this document may be copied...

Page 4: ...e Q Page iv D000006 000 Rev A 2008 QLogic Corporation All rights reserved worldwide First Published March 2007 Printed in U S A QLogic Corporation 26650 Aliso Viejo Parkway Aliso Viejo CA 92656 800 66...

Page 5: ...nstalling InfiniBand on the Remaining Servers 3 12 3 8 Verifying InfiniBand on the Remaining Servers 3 16 3 9 Complete Installation of additional IB Management Nodes 3 18 3 10 Configure and Initialize...

Page 6: ...is Admin via Fast Fabric 4 11 4 3 1 Edit the Configuration and Select Edit Chassis Files 4 11 4 3 2 Verify Chassis via Ethernet Ping 4 11 4 3 3 Update Chassis Firmware 4 12 4 3 4 Show Status of Chassi...

Page 7: ...info 5 25 5 4 2 showallports 5 26 5 4 3 iba_report 5 28 5 4 4 saquery 5 56 5 5 Advanced Initialization and Verification ibtest 5 60 5 5 1 ibtest Host Operations 5 63 5 5 2 ibtest Chassis Operations 5...

Page 8: ...the Remaining Servers A 2 A 6 Verifying Infiniband on the Remaining Servers A 3 A 7 Complete Installation of additional IB Management Nodes A 3 A 8 Configure and initialize health check tools A 4 App...

Page 9: ...t tools Section 6 describes MPI Sample Applications Appendix A presents the Fast Fabric Quick Install Checklist Appendix B describes the Fast Fabric Configuration Files Appendix C provides information...

Page 10: ...and software updates 1 3 1 Availability QLogic Technical Support for products under warranty is available during local standard working hours excluding QLogic Observed Holidays 1 3 2 Contact Informati...

Page 11: ...t network connectivity Verify host OS levels Sets up ssh keys Performs initial InfiniBand software installation Configures Internet Protocol over InfiniBand IPoIB IP addresses Performs InfiniBand driv...

Page 12: ...to accelerate common chassis and switch administration tasks Manage firmware levels on switches and chassis Execute commands across many chassis Assists in the initial benchmarking and tuning of High...

Page 13: ...sired setup remote access to the IB Management Node via ssh telnet X windows VNC or any other mechanism which will allow the remote user to access a Linux Command Line shell Typically Fast Fabric is o...

Page 14: ...ser interaction and hence reduce the time to perform operations against many hosts or chassis After initial installation Fast Fabric can be configured to use IPoIB instead of the management network NO...

Page 15: ...ally applicable to all environments and will be marked with All NOTE Some of the Linux steps may be applicable to other Unix like operating systems if it is desired to enable use of non IB specific Fa...

Page 16: ...Install the desired Linux OS version with the same kernel distribution on all hosts Generally the IB Management node s i e the host which will run Fast Fabric should have a full install and must inclu...

Page 17: ...be accomplished using chkconfig rsh on Also enable rexec and rlogin using the above steps b Execute mv etc securetty etc securetty bak 6 All TCP IP Host Name resolution Fast Fabric and TCP IP will nee...

Page 18: ...ronize their clocks with the NTP server Consult the Linux OS documentation for information on how to configure NTP servers and clients 8 All On the IB Management node install the Fabric Access Softwar...

Page 19: ...Host Setup Menu 4 1 1 0 15 Fast Fabric Host List etc sysconfig iba hosts 0 Edit Config and Select Edit Hosts Files Perform 1 Verify Hosts via Ethernet ping Perform 2 Verify rsh rcp Configured Skip 3 S...

Page 20: ...itor selected via the EDITOR environment variable In addition it will also permit review and editing of the fastfabric conf file The fastfabric conf file guides the overall configuration of Fast Fabri...

Page 21: ...s its best to place all files at a given firmware level into a single directory whose name indicates the firmware revision number Once the above steps have been completed additional setup of the Chass...

Page 22: ...ned above or IP addresses Use of names is recommended One entry per line Such as Chassis1 Chassis2 NOTE Do not list externally managed switches such as the SilverStorm 9024FC switches in this file Tho...

Page 23: ...are running the firmware level provided and install and or reboot each chassis as needed If any chassis fails to be updated use the View ibtest result files option to review the result files from the...

Page 24: ...les on the CD will be used to upgrade the firmware on each switch NOTE When copying files its best to place all files at a given firmware level into a single directory whose name indicates the firmwar...

Page 25: ...th internally and externally managed switches and hence the output must be edited to leave only the SilverStorm externally managed switches saquery t sw o nodeguid 6 Switch Update Switch Firmware will...

Page 26: ...tches listed in the ibnodes file the previous step may be repeated to review and edit the file as needed 7 Switch If any 9024FC switches were skipped above in step 5 and 6 these steps should be repeat...

Page 27: ...and operation also select Verify rsh rcp Configured However it is instead recommended that ssh be used in which case this step can be skipped NOTE If etc hosts will be used for name resolution as opp...

Page 28: ...powered on and booted Is host connected to management network Is host management network IP address and network settings consistent with DNS or etc hosts Is Management node connected to the management...

Page 29: ...can be found When prompted select to do an initial installation as follows Would you like to do an upgrade install y n Would you like to do an initial install load n y NOTE An initial installation wi...

Page 30: ...torm Technologies Inc IB Host Admin Menu 4 1 1 0 15 Fast Fabric Host List etc sysconfig iba allhosts 0 Edit Config and Select Edit Hosts Files Perform 1 Verify Hosts via Ethernet ping Perform 2 Summar...

Page 31: ...he hosts file the previous step may be repeated to review and edit the file as needed 5 All Summary of Fabric Components will provide a brief summary of the counts of components in the fabric includin...

Page 32: ...nge carefully examine the two hosts involved to verify that the PCI slot used BIOS settings and any motherboard jumpers related to devices on PCI buses or slot speeds Also verify HCA and riser cards a...

Page 33: ...r copying the files edit the hosts and allhosts files such that the file on each IB Management Node omits itself from the hosts files but lists all other IB Management Nodes and specifies itself in th...

Page 34: ...talled One of the more popular measures of overall performance is HPL This is the application used to rate systems on the Top 500 list The steps allow some initial runs of HPL to be made and provide s...

Page 35: ...review the host configuration and stop these extra processes if possible HPL is very sensitive to swapping If a lot of swapping is seen and xhpl is dropping below 97 for long durations this may indic...

Page 36: ...ct Edit Hosts Files Perform 1 Verify Hosts via Ethernet ping Skip 2 Verify rsh rcp Configured Skip 3 Setup Password less ssh scp Skip 4 Copy etc hosts to all hosts Skip 5 Show uname a for all hosts Sk...

Page 37: ...InfiniServ software installed If any hosts fail to be updated use the View ibtest result files option to review the result files from the update See the section Interpreting the ibtest log files on pa...

Page 38: ...3 Getting Started Upgrading IB software 3 24 D000006 000 Rev A Q...

Page 39: ...re is being used on the hosts Those will be marked with Host All menu items which are applicable only when SilverStorm IB Switches or Chassis are being used will be marked with Switch All remaining me...

Page 40: ...Previous Menu or ESC The submenus typically present operations in the typical order they would be used during an installation Pressing the keys corresponding to menu items 0 9 a e in the example above...

Page 41: ...able In addition it will also permit review and editing of the fastfabric conf file The fastfabric conf file guides the overall configuration of Fast Fabric and describes cluster specific attributes o...

Page 42: ...n and Select Edit Hosts Files All This will permit the hosts and fastfabric conf files to be edited The hosts file selected and created via this menu should not list the Fast Fabric host itself After...

Page 43: ...hosts Review the results carefully to verify all the hosts have the expected OS version In typical clusters all hosts will be running the same OS and kernel version 4 1 7 Install Upgrade QuickSilver S...

Page 44: ...lting object files to all the hosts This is in preparation for execution of MPI performance tests and benchmarks in a later step 4 1 10 Reboot Hosts Linux This will run the ibtest reboot command to re...

Page 45: ...scpall command A file on the local host may be specified to be copied to all selected hosts 4 1 15 View ibtest result files All This permits viewing of the test log and test res files that reflect th...

Page 46: ...mand on all hosts Skip a View ibtest result files Skip P Perform the selected actions N Select None X Return to Previous Menu or ESC 4 2 1 Edit Config and Select Edit Hosts Files All This will permit...

Page 47: ...run iba_report i 10 o errors o slowlinks on the IB Management node This will check all the ports in the fabric for any links which have high error rates or are running at a lower speed than expected A...

Page 48: ...to devices on PCI buses or slot speeds Also verify HCA and riser cards are properly seated 4 2 9 Generate all Hosts Problem Report Info Host This will run the captureall command to collect configurat...

Page 49: ...selected actions N Select None X Return to Previous Menu or ESC 4 3 1 Edit the Configuration and Select Edit Chassis Files Switch This will permit the chassis and fastfabric conf files to be edited Th...

Page 50: ...ch chassis and select it for use on next reboot run push firmwarew to each chassis select it for use and if its not the presently running firmware reboot the chassis Additional options prompted for pa...

Page 51: ...le tgz file that can be sent to the Support Representative 4 3 7 Run a command on all chassis Switch This will run the cmdall C command A Chassis CLI command may be specified to be executed against al...

Page 52: ...r ESC 4 4 1 3 4 1Edit Config and Select Edit Chassis Files Switch This will permit the ibnodes and fastfabric conf files to be edited The ibnodes file selected and created via this menu should not lis...

Page 53: ...emfw files If any switches fail to be updated use the View ibtest result files option to review the result files from the update Refer to the section Interpreting the ibtest log files on page 5 68 for...

Page 54: ...4 Fast Fabric TUI Menu SilverStorm Externally Managed IB Switch Administration via Fast Fabric 4 16 D000006 000 Rev A Q...

Page 55: ...H Most of the tools are installed in sbin 5 1 Common Tool Options Therearesomecommonoptionstotheassortedcommandlinetools Theseoptions are applicable to most of the tools 5 1 1 Will display Usage infor...

Page 56: ...n be ssh Consult the SilverStorm 9000 Users Guide for more information 5 1 4 C Specifies that the given operation should be performed against chassis By default Fast Fabric operations are performed ag...

Page 57: ...ptions are considered in the following order the first item listed below that is specified is used for the given command 1 h option 2 HOSTS environment variable 3 f option 4 HOSTS_FILE environment var...

Page 58: ...IP addresses Typically management network hostnames are specified However if desired IPoIB hostnames or IP addresses may be used This can accelerate large file transfers and other operations Files to...

Page 59: ...variable 3 F option 4 CHASSIS_FILE environment variable 5 etc sysconfig iba chassis file For example if the H option is used and the CHASSIS_FILE environment variable is also exported the command will...

Page 60: ...s or include directives the must be white space separated from any preceding name IP address or included file name 5 1 7 2 Explicit Chassis names When chassis are explicitly specified via the H option...

Page 61: ...ith all relevant slots as part of that single specification This is important so that parallel operations do not cause conflicting concurrent operations against a given chassis 5 1 8 Selection of Swit...

Page 62: ...a comment 0x00066a00d9000138 i9k138 Node GUID with desired Name 0x00066a00d9000139 i9k139 Node GUID with desired Name include etc sysconfig iba moreswitches included file Each line of the switch list...

Page 63: ...66a00d9000139 i9k139 5 1 9 Selection of local Ports subnets Some of the fabric health commands fabric_analysis all_analysis permits a specific set of local HCA ports to be used for fabric analysis The...

Page 64: ...In some fabrics it may be useful to create multiple files in etc sysconfig iba representing different subsets of the ports from which the user may operate For example etc sysconfig iba ports primary...

Page 65: ...ede the comment On lines with a port or include directive the must be white space separated from any preceding port or included filename 5 1 9 2 Explicit ports When ports are explicitly specified via...

Page 66: ...s RCP and commands rsh to be run from this host to all the other hosts and to itself via localhost as a specific user default is root Additionally this command can be used to verify rsh is setup to al...

Page 67: ...f hostfile h hosts u user S C only perform connect to enter in local hosts knownhosts When run in this mode S and s options are ignored s use ssh scp to transfer files default is rsh rcp i ipoib_suff...

Page 68: ...etup_ssh for initial key exchange is with the s and S options This requires all hosts have been configured with the same password for the specified user typically root In this mode the password will b...

Page 69: ...nstallation and booting of IB software setup_ssh will need to be rerun with the C option to update the knownhosts file 5 2 4 cmdall LinuxandSwitch ExecutesacommandonallhostsorSilverStormIBchassis This...

Page 70: ...or example when running host commands such as rm the i option interactively prompt before removal should not be used Note that this option is sometimes part of a standard bash alias list Similarly whe...

Page 71: ...capture of d upload_dir directory to upload to default is uploads If not specified the environment variable UPLOADS_DIR will be used If that is not exported the default uploads will be used S securel...

Page 72: ...sis then creates mycapture all tgz captureall C H chassis1 chassis2 030127capture Environment Variables The following environment variables are also used by this command HOSTS HOSTS_FILE see discussio...

Page 73: ...Pandrequirethatpassword lessSSH SCP be setup between the host running Fast Fabric and the hosts files that are being transferred to and from The setup_ssh Fast Fabric tool can aid in setting up passwo...

Page 74: ...ot specified it defaults to the present directory name If both the source and destination directory names are omitted they both default to the current directory name Example copy a single file scpall...

Page 75: ...in cluster default is etc sysconfig iba hosts h hosts list of hosts to upload from u user user to perform copy to default is current user code d upload_dir directory to upload to default is uploads If...

Page 76: ...ot be used in the arguments to uploadall To copy files from this host to hosts in the cluster use scpall or downloadall Environment Variables The following environment variables are also used by this...

Page 77: ...of the file or directory on the destination hosts to copy to If more than one source file is specified dest_file will be treated as a directory name The given directory must already exist on the desti...

Page 78: ...le was changed for some or all of the hosts it can then be downloaded to all the hosts downloadall d uploads ifcfg ib1 etc sysconfig network scripts ifcfg ib1 Alternatively if there was no need to dow...

Page 79: ...umber of Switch Chips 6 Number of Links 29 Number of 1x Ports 2 The output is as follows SM each subnet manger SM running in the fabric is listed along with its node name port GUID and present SM stat...

Page 80: ...tches NOTE iba_report is a newer and more powerful Fast Fabric command For general fabric analysis use iba_report with options such as o errors and or o slowlinks to perform a more efficient analysis...

Page 81: ...his command HOSTS HOSTS_FILE see discussion on selection of hosts above CHASSIS CHASSIS_FILE see discussion on selection of chassis above IBNODES IBNODES_FILE see discussion on selection of switches a...

Page 82: ...ounters IBTA mandatory counters Also any end nodes which report support of a IBTA device management agent must implement the IOU Info IOC Profile and Service Entry queries as outlined in the IBTA 1 1...

Page 83: ...il 0 n for output default is 2 P persist only include data persistent across reboots H hard only include permanent hardware data N noname omit node and IOC names x xml output in XML s stats get perfor...

Page 84: ...summary of links configured to run slower than supported includes slowlinks slowconnlinks summary of links connected with mismatched speed potential includes slowconfiglinks misconfiglinks summary of...

Page 85: ...OC Profile ID String IOC Name iocpat value1 port value2 value1 is global pattern for IOC Profile ID String IOC Name value2 is port ioctype value value is IOC type VNIC or SRP ioctype value1 port value...

Page 86: ...a portguid iba_report o brnodes F portguid 0x00066a00a0000380 Find all the connections to a server iba_report o links F node duster Find all the connections to a switch chip iba_report o links F node...

Page 87: ...clear them then recheck iba_report o errors C sleep 10 iba_report o errors Clear all port counters wait 10 seconds then check Iba_report i 10 o errors Check all port counters on a server iba_report o...

Page 88: ...00b 0x00066a00a00001b8 4x 2 5Gb 0x00066a0098000380 CA goblin 1 0x000a 0x00066a00a0000380 4x 2 5Gb 0x00066a0098000384 CA cuda 1 0x0005 0x00066a00a0000384 1x 2 5Gb 2 0x0006 0x00066a01a0000384 4x 2 5Gb 0...

Page 89: ...0066a00280002cd SW InfiniCon Systems InfiniFabric Sw A Dev A 0 0x0013 0x00066a00280002cd Noop Noop 3 4x 2 5Gb 5 4x 2 5Gb 0x00066a00d8000123 SW InfiniCon Systems InfinIO9024 0 0x0001 0x00066a00d8000123...

Page 90: ...2 5Gb 4 4x 2 5Gb 1 Connected SMs in Fabric State GUID Name Master 0x00066a00d8000123 InfiniCon Systems InfinIO9024 Each iba_report allows for various levels of detail Increasing detail is shown as fur...

Page 91: ...get a report with a little more detail root duster root iba_report d 1 Node Type Brief Summary 14 Connected CAs in Fabric NodeGUID Type Name 0x0002c9020020e0d4 CA coyote1 0x00066a00580001e0 CA VEx in...

Page 92: ...ID and will therefore be properly grouped However some third party devices do not implement the system image GUID and may report a value of 0 In such a case iba_report will treat each component as an...

Page 93: ...ver it does not include links which are running slower than expected misconnlinks this is similar to slowconnlinks in that it reports links which have been connected between ports of different speed p...

Page 94: ...further limit the report to only include hardware information This is a superset of P and omits more information A related but independent option is N This will omit all the node and IOC names from th...

Page 95: ...formance or error situation which is being reported between 2 specific points in the fabric Such as a StatusTimeoutRetry that MPI may be reporting between 2 processes in its run Focus can use glob sty...

Page 96: ...ssis 0x00066A005000010C Slot 2 IOC 1 iba_report o nodes F ioc Chassis 0x00066A005000010C Slot 2 IOC 1 port 2 iba_report o nodes F iocpat Slot 2 iba_report o nodes F iocpat Slot 2 port 2 iba_report o n...

Page 97: ...the initial design bin bash specify some filenames to use expected_config usr local report master master copy of config previously created config tmp report where we will generate new report diffs tmp...

Page 98: ...r than supported Rate NodeGUID Port Type Name Enabled Supported 2 5g 0x00066a0098000384 1 CA cuda 1x 2 5Gb 1 4x 2 5Gb 0x00066a00d8000123 2 SW InfiniCon Systems InfinIO9024 1 4x 2 5Gb 1 4x 2 5Gb 20 of...

Page 99: ...shold 3 10g 0x00066a0098000380 1 CA goblin SymbolErrorCounter 65535 Exceeds Threshold 100 LinkErrorRecoveryCounter 255 Exceeds Threshold 3 PortRcvErrors 65535 Exceeds Threshold 100 0x00066a00d8000123...

Page 100: ...ports 0x00066a00980001b8 1 CA orc and 0x00066a0098000001 1 CA julio 1 Paths SGID 0xfe80000000000000 00066a00a00001b8 DGID 0xfe80000000000000 00066a00a0000001 SLID 0x000b DLID 0x000c Reversible Y PKey...

Page 101: ...Links Checked 0 Errors found Links with errors threshold Summary Focused on 4 Ports 1 0x00066a00a00001b8 in Node 0x00066a00980001b8 CA orc 10 in Node 0x00066a00d8000123 SW InfiniCon Systems InfinIO90...

Page 102: ...s M_Key 0 P_Key 0 Q_Key 0 ErrorLimits Overrun 15 LocalPhys 15 DiagCode 0x0000 P_Key Enforcement In Off Out Off FilterRaw In Off Out Off Performance Transmit Xmit Data 16383 MB 4294967295 Quads Xmit Pk...

Page 103: ...Y 0x0000000000000000 Lease 0 s Protect Readonly MTU Active 2048 Supported 2048 VL Stall 0 LinkWidth Active 4x Supported 1 4x Enabled 1 4x LinkSpeed Active 2 5Gb Supported 2 5Gb Enabled 2 5Gb VLs Activ...

Page 104: ...Enabled 1 4x LinkSpeed Active 2 5Gb Supported 2 5Gb Enabled 2 5Gb VLs Active 1 1 Supported 4 1 HOQLife 4096 ns Capability 0x02090048 CR DM CM SL Trap Violations M_Key 0 P_Key 0 Q_Key 0 ErrorLimits Ove...

Page 105: ...49 2 CA rockaway 0x00066a00d8000123 3 SW InfiniCon Systems InfinIO9024 10g 0x00066a0098002813 1 CA brady 0x00066a00d8000123 19 SW InfiniCon Systems InfinIO9024 10g 0x00066a0098002813 2 CA brady 0x0006...

Page 106: ...in Node 0x00066a0098000384 CA cuda 13 Connected CAs in Fabric Name cuda NodeGUID 0x00066a0098000384 Type CA Ports 2 PartitionCap 64 SystemImageGuid 0x00066a0098000384 BaseVer 1 SmaVer 1 VendorID 0x66a...

Page 107: ...ts PortNum 1 LID 0x0015 GUID 0x00066a00a00003a6 Neighbor 0x00066a00d8000123 9 SW InfiniCon Systems InfinIO9024 Width 4x Speed 2 5Gb PortNum 2 LID 0x0016 GUID 0x00066a01a00003a6 Neighbor 0x00066a00d800...

Page 108: ...Gb Supported 2 5Gb Enabled 2 5Gb VLs Active 4 1 Supported 4 1 HOQLife 4096 ns Capability 0x02010048 CR CM SL Trap Violations M_Key xxxxx P_Key xxxxx Q_Key xxxxx ErrorLimits Overrun 15 LocalPhys 15 Dia...

Page 109: ...LinkWidth Active 4x Supported 1 4x Enabled 1 4x LinkSpeed Active 2 5Gb Supported 2 5Gb Enabled 2 5Gb VLs Active 4 1 Supported 4 1 HOQLife 4096 ns Capability 0x02010048 CR CM SL Trap Violations M_Key x...

Page 110: ...he IB management node is connected to more than one fabric i e a subnet the HCA and port may be specified to select the fabric whose SA is to be queried Usage saquery v h hca p port o type l lid t typ...

Page 111: ...ortguid list of port guids lid list of lids gid list of gids desc list of node descriptions names path list of path records node list of node records portinfo list of port info records sminfo list of...

Page 112: ...portinfo sminfo swinfo link slvl vlarb pkey guids service mcmember inform linfdb ranfdb mcfdb trace l lid systemguid nodeguid portguid lid desc path node portinfo swinfo slvl vlarb pkey guids service...

Page 113: ...mcfdb trace P port_guid_pair path trace systemguid nodeguid portguid lid gid desc node portinfo sminfo swinfo link slvl vlarb pkey guids service mcmember inform linfdb ranfdb mcfdb G gid_pair path tra...

Page 114: ...assis default is hosts n perform operation against IB node default is hosts i ipoib_suffix suffix to apply to host names to create ipoib host names The default is ib f hostfile file with hosts in clus...

Page 115: ...firmware is in primary and running The default is push S securely prompt for password for user on remote system chassis test test to run Host Test can be one or more of load initial install of all hos...

Page 116: ...t is executing Test log will contain detailed information about what was performed This will include the specific commands executed and the resulting output The test_tmp directories will contain tempo...

Page 117: ...g host load Used in absence of I option FF_UPGRADE_OPTIONS upgrade options for host IB software INSTALL during host upgrade Used in absence of U option FF_PACKAGES host packages to load during host lo...

Page 118: ...working directory and will be copied to all the end nodes and installed NOTE Only those Fabric Access components that are currenly installed will be upgraded This operation will fail for nodes that d...

Page 119: ...2 3 4 5 6 This can be used to verify switch latency hops PCI bandwidth and overall MPI performance The test res file will have the results of each pair of nodes tested To obtain accurate results this...

Page 120: ...keys to be configured within the chassis for secure password less login In this case there is no need to configure a FF_CHASSIS_ADMIN_PASSWORD and FF_CHASSIS_LOGIN_METHOD can be SSH Refer to the Silve...

Page 121: ...lished In most cases a parallel upgrade is recommended for expediency 5 5 3 2 upgrade Upgrades the firmware on each specified switch The P option selects a directory containing a emfw file or provides...

Page 122: ...om that host and or chassis For example test log may contain lines such as scp InfiniServPerf 4 1 1 0 15 tgz root n001a TEST CASE FAILURE scp InfiniServPerf 4 1 1 0 15 tgz root n001a failed ssh n001a...

Page 123: ...mand is recommended as the primary tool for general analysis When its desired to restrict the analysis to a specific subset of components use one of the commands below fabric_analysis performs fabric...

Page 124: ...ors reported in the files indicated by the tools Once all the errors are corrected perform a baseline of the configuration using the b option The baseline configuration will be saved to files in FF_AN...

Page 125: ...diff would be from the new snapshot Another command which can be useful is the Linux sdiff command For more information about the diff output format consult the Linux man page for diff If the configur...

Page 126: ...analysis b e s d dir c file t portsfile p ports b baseline mode default is compare check mode e evaluate health only default is compare check mode s save history of failures errors differences d dir t...

Page 127: ...l be used to analyze the fabric However in more complex fabrics the Fast Fabric host may be connected to more than one fabric e g an IB subnet In this case the specific ports and or HCAs to use for fa...

Page 128: ...ed during fabric error analysis latest fabric 0 0 errors stderr stderr of iba_report during fabric error analysis Baseline baseline fabric 0 0 comps iba_report summary of fabric components and basic S...

Page 129: ...in fabric replacement of HCA or IB Switch hardware Adding Removing IB Nodes CA Virtual CAs Virtual Switches Physical Switches Physical Switch internal switching cards leaf spine Changes to server or...

Page 130: ...bles or bad ports or poor connections Side effect is the verification of SA health 5 6 4 chassis_analysis Switch The chassis_analysis command has the following usage chassis_analysis b e s d dir F cha...

Page 131: ...howInventory fwVersion showIBNodeDesc ismShowPStatThresh ismChassisSet12x timeZoneConf timeDSTConf snmpCommunityConf snmpTargetAddr showChassisIpAddr showDefaultRoute The commands specified in FF_CHAS...

Page 132: ...selected chassis baseline chassis showChassisIpAddr the output of the showChassisIpAddr command for all selected chassis baseline chassis showDefaultRoute the output of the showDefaultRoute command fo...

Page 133: ...of baseline and latest showChassisIpAddr latest chassis showDefaultRoute the output of the showDefaultRoute command for all selected chassis latest chassis showDefaultRoute diff the diff of the basel...

Page 134: ...baseline Based upon showInventory addition removal of Chassis FRUs Replacement is only checked for FRUs that showInventory displays the serial number For the 9000 series the fan and power supply repla...

Page 135: ...of Fans in chassis Status of Power Supplies in chassis Temp Voltage for each card Presence of adequate power cooling of FRUs Presence of N 1 power cooling of FRUs Presence of Redundant AC input 5 6 5...

Page 136: ...nfigured via the FF_CHASSIS_HEALTH and FF_CHASSIS_CMDS parameters Health Check latest hostsm smstatus the output of the sm_query smShowStatus command Baseline baseline hostsm smver host SM version bas...

Page 137: ...istoryoffailedchecks The default is var opt iba analysis G esmchassisfile the file with SM chassis within the cluster The default is etc sysconfig iba esm_chassis E esmchassis the list of the SM chass...

Page 138: ...list The esm_analysis variable performs analysis against one or more chassis in the fabric As such it permits a chassis to be specified via the E G ESM_CHASSIS ESM_CHASSIS_FILE or fastfabric conf The...

Page 139: ...rms The diff files are only created if differences are detected If the s option is used and failures are detected files related to the checks that have failed are also copied to a time stamped directo...

Page 140: ...config iba ports p ports a list of local HCA ports used to access fabric s for analysis The default is the first active port This is specified as hca port 0 0 1st active port in system 0 y port y with...

Page 141: ...PORTS_FILE a file containing a list of ports used in absence of t and p FF_TIMEOUT_MULT multiplier for response timeouts The default is 2 This typically does not need to be set but in the event of un...

Page 142: ...lysis tool to the appropriate administrators for further analysis and corrective action as needed NOTE Running these tools too often can have negative impacts Among the potential risks Each run adds a...

Page 143: ...natives to full include quick builds just OSU Pallas and HPL all builds just OSU Pallas HPL and NAS benchmarks In order to run the applications an mpi_hosts file must be created in opt iba src mpi_app...

Page 144: ...ecifying the maximum message size can be provided This benchmark will only use the first two nodes listed in mpi_hosts During this benchmark the opt iba src mpi_apps mpi param pallas config file is us...

Page 145: ...param pallas config file is used 6 5 OSU Bidirectional Bandwidth This is a simple benchmark of maximum bidirectional bandwidth A script is provided to run this application that will execute an assortm...

Page 146: ...m sizes m a medium problem size l a large problem size These can be selected using config_hpl The following command displays the preconfigured problem sizes available config_hpl For example to quickly...

Page 147: ...luster approximately 256 processors or greater it is rather large at 2 5GB As such it is recommended that Pallas be used for smaller runs 2 32 processes or that it be recognized that the benchmark is...

Page 148: ...6 MPI Sample Applications Pallas 6 6 D000006 000 Rev A Q...

Page 149: ...following options a For root user command prompt ends in or NOTE There must be a space after or b Tcl and Expect packages installed on all IB Management Nodes 4 Remote login as root enabled a If using...

Page 150: ...uration completed Such as a configuration of NTP b configuration of timezone c configuration of a syslog server A 3 Installing and Configuring the Subnet Manager 1 Subnet Manager installed enabled 2 S...

Page 151: ...n the Remaining Servers 1 Fastfabric conf file reviewed 2 etc sysconfig iba allhosts file created listing all hosts including IB management nodes 3 Verify hosts via Ethernet ping 4 Summary of fabric c...

Page 152: ...each IB management node from which the health check tools will be used 1 Edit fastfabric conf and review the health check tools parameters 2 If using embedded SM s create etc sysconfig iba esm_chassi...

Page 153: ...c tools For a given release consult etc sysconfig fastfabric conf sample for a sample file with the defaults of the given release If fastfabric conf does not assign a value to a given configuration va...

Page 154: ...ysconfig then CONFIG_DIR etc sysconfig else CONFIG_DIR etc fi export CONFIG_DIR fi Override default location for HOSTS_FILE export HOSTS_FILE HOSTS_FILE CONFIG_DIR iba hosts Override default location...

Page 155: ...ost_basename_to_ipoib 1 hostname provided echo 1 FF_IPOIB_SUFFIX fi shell Function to convert a hostname into a basic hostname eg remove IPoIB suffix etc should match result of hostname s on host if F...

Page 156: ..._UPGRADE_OPTIONS FF_UPGRADE_OPTIONS where to upload server specific files to during uploadall captureall d option export UPLOADS_DIR UPLOADS_DIR uploads where to download server specific files from du...

Page 157: ...ib1 ib2 On OFED stack it will be ib0 ib1 export FF_IPOIB_BASE_DEV_NUM FF_IPOIB_BASE_DEV_NUM 1 shell Function to return the base IPoIB device number for this stack type For Silverstorm stack installat...

Page 158: ...export FF_ESM_CMDS FF_ESM_CMDS smShowSMParms smShowDefBcGroup list of analysis to perform during all_analysis pick appropriate type of SM to analyze export FF_ALL_ANALYSIS FF_ALL_ANALYSIS fabric chass...

Page 159: ...B Fast Fabric Configuration Files D000006 000 Rev A B 7 Q NOTE Do not edit etc sysconfig iba iba_mon conf sample...

Page 160: ...ied will be cleared by iba_mon and may impact any remote Performance Managers which are monitoring the given Counter Interval 10 monitoring interval in seconds SyslogFacility local6 syslog facility co...

Page 161: ...names are generally easier to translate than numeric IP addresses Typically management network host names are specified However if desired IPoIB hostnames or IP addresses may be used This can accelera...

Page 162: ...operations are performed against the management card in the chassis Foroperations such as cmdall the commandisexecutedagainstthe management interface for the given chassis For more sophisticated opera...

Page 163: ...Node GUID with desired Name include etc sysconfig iba moreswitches included file Each line of the switch list file may specify a single switch a comment or another switch list file to include Switche...

Page 164: ...e command line Refer to the section Selection of local Ports subnets on page 5 9 for more information Below is a sample port list file this is a comment 1 1 first port on 1st HCA 1 2 second port on 1s...

Page 165: ...on Files D000006 000 Rev A B 13 Q Comments may be placed on any line By using a to precede the comment On lines with a port or include directive the must be white space separated from any preceding po...

Page 166: ...B Fast Fabric Configuration Files Port List Files B 14 D000006 000 Rev A Q...

Page 167: ...rmed by adding a ib suffix to the management network name If a different suffix is desired FF_IPOIB_SUFFIX can be changed If IPoIB is also being used as the management network FF_IPOIB_SUFFIX can be s...

Page 168: ...C Configuration of IPoIB Name Mapping C 2 D000006 000 Rev A Q...

Page 169: ...sibly SM nodes and no servers are installed in more than one subnet consult the instructions below for Primarily Independent Subnets 3 The subnets are overlapping If multiple IB components are common...

Page 170: ...configure a host SM node to manage more than one IB subnet Installing and Verifying Firmware on the IB Switches on page 3 10 At this time this operation is not supported for IB management nodes conne...

Page 171: ...e same set of subnets the files copied to each management node may need to be slightly different For example configuration files for fabric_analysis may indicate different port numbers or host files u...

Page 172: ...can be performed as per the instructions When creating the chassis file list all SilverStorm 9000 series internally managed IB switches in all subnets If desired additional files may also be created...

Page 173: ...nux Refresh SSH Known Hosts on page 4 9 may be run per the instructions Host Check MPI Performance on page 4 10 can be run for each subnet by using the allhosts files specific to each subnet i e those...

Page 174: ...all subnets are checked Similarly the esm_chasssis and chassis files used should list all relevant SilverStorm IB chassis in all subnets Running HPL on page 3 20 can be run for each subnet by creatin...

Reviews: