background image

3 – Getting Started
Running HPL

3-20

D000006-000 Rev A

Q

Create the file with a list of the chassis names (the TCP/IP Ethernet management 
port names assigned above) or IP addresses (Use of names is recommended).  
One entry per line. Such as:

Chassis1

Chassis2

For further details about the file format refer to the section “Selection of Chassis” 
on page 5-4.

3.

(All) 

Perform a health check using: all_analysis -e.  If any errors are 

encountered resolve the errors and rerun all_analysis -e until a clean run 
occurs.

4.

(All) 

Create a cluster configuration baseline using: all_analysis -b

5.

(All) 

If desired, schedule regular runs of all_analysis via cron or other 

mechanisms.  Consult the Linux OS documentation for more information on 
cron.  Also consult the section “Health Check and Baselining Tools” on 
page 5-69 for more information about all_analysis and its automated use.

3.11

Running HPL

As part of the installation process, a set of common MPI benchmarks have been 
installed.  One of the more popular measures of overall performance is HPL.  This 
is the application used to rate systems on the Top 500 list.  The steps allow some 
initial runs of HPL to be made and provide some initial baseline numbers.  The 
defaults provided should perform within 10%-20% of optimal HPL results for the 
cluster.  Tuning for that additional 10%-20% is beyond the scope of this document.

1.

(Host) 

To run HPL, first select a configuration file appropriate to your cluster.  

It is best to start with a small configuration to verify HPL has been properly 
compiled:

a. cd /opt/iba/src/mpi_apps

b. /config_hpl 2t

will configure a two process test run of HPL.

2.

(Host) 

Now create the file /opt/iba/src/mpi_apps/mpi_hosts listing the host 

names of all the hosts.  Depending of your selection of 
VIADEV_PATH_METHOD in /opt/iba/src/mpi_apps/mpi.param.hpl the user 
can specify Ethernet or IPoIB host names.  The default config will allow either.

3.

(Host) 

Now run HPL:

./run_hpl 2

Since this is a very small problem size the performance of the run will be much 
lower than the potential of the machine.  So do not worry about performance, 
just whether or not the run was successful.

Содержание Fast Fabric

Страница 1: ...D000006 000 Rev A Page i Q S i m p l i f y Fast Fabric Users Guide...

Страница 2: ...Fast Fabric Users Guide Q Page ii D000006 000 Rev A...

Страница 3: ...table for the specified use without further testing or modification QLogic Corporation assumes no responsibility for any errors that may appear in this document No part of this document may be copied...

Страница 4: ...e Q Page iv D000006 000 Rev A 2008 QLogic Corporation All rights reserved worldwide First Published March 2007 Printed in U S A QLogic Corporation 26650 Aliso Viejo Parkway Aliso Viejo CA 92656 800 66...

Страница 5: ...nstalling InfiniBand on the Remaining Servers 3 12 3 8 Verifying InfiniBand on the Remaining Servers 3 16 3 9 Complete Installation of additional IB Management Nodes 3 18 3 10 Configure and Initialize...

Страница 6: ...is Admin via Fast Fabric 4 11 4 3 1 Edit the Configuration and Select Edit Chassis Files 4 11 4 3 2 Verify Chassis via Ethernet Ping 4 11 4 3 3 Update Chassis Firmware 4 12 4 3 4 Show Status of Chassi...

Страница 7: ...info 5 25 5 4 2 showallports 5 26 5 4 3 iba_report 5 28 5 4 4 saquery 5 56 5 5 Advanced Initialization and Verification ibtest 5 60 5 5 1 ibtest Host Operations 5 63 5 5 2 ibtest Chassis Operations 5...

Страница 8: ...the Remaining Servers A 2 A 6 Verifying Infiniband on the Remaining Servers A 3 A 7 Complete Installation of additional IB Management Nodes A 3 A 8 Configure and initialize health check tools A 4 App...

Страница 9: ...t tools Section 6 describes MPI Sample Applications Appendix A presents the Fast Fabric Quick Install Checklist Appendix B describes the Fast Fabric Configuration Files Appendix C provides information...

Страница 10: ...and software updates 1 3 1 Availability QLogic Technical Support for products under warranty is available during local standard working hours excluding QLogic Observed Holidays 1 3 2 Contact Informati...

Страница 11: ...t network connectivity Verify host OS levels Sets up ssh keys Performs initial InfiniBand software installation Configures Internet Protocol over InfiniBand IPoIB IP addresses Performs InfiniBand driv...

Страница 12: ...to accelerate common chassis and switch administration tasks Manage firmware levels on switches and chassis Execute commands across many chassis Assists in the initial benchmarking and tuning of High...

Страница 13: ...sired setup remote access to the IB Management Node via ssh telnet X windows VNC or any other mechanism which will allow the remote user to access a Linux Command Line shell Typically Fast Fabric is o...

Страница 14: ...ser interaction and hence reduce the time to perform operations against many hosts or chassis After initial installation Fast Fabric can be configured to use IPoIB instead of the management network NO...

Страница 15: ...ally applicable to all environments and will be marked with All NOTE Some of the Linux steps may be applicable to other Unix like operating systems if it is desired to enable use of non IB specific Fa...

Страница 16: ...Install the desired Linux OS version with the same kernel distribution on all hosts Generally the IB Management node s i e the host which will run Fast Fabric should have a full install and must inclu...

Страница 17: ...be accomplished using chkconfig rsh on Also enable rexec and rlogin using the above steps b Execute mv etc securetty etc securetty bak 6 All TCP IP Host Name resolution Fast Fabric and TCP IP will nee...

Страница 18: ...ronize their clocks with the NTP server Consult the Linux OS documentation for information on how to configure NTP servers and clients 8 All On the IB Management node install the Fabric Access Softwar...

Страница 19: ...Host Setup Menu 4 1 1 0 15 Fast Fabric Host List etc sysconfig iba hosts 0 Edit Config and Select Edit Hosts Files Perform 1 Verify Hosts via Ethernet ping Perform 2 Verify rsh rcp Configured Skip 3 S...

Страница 20: ...itor selected via the EDITOR environment variable In addition it will also permit review and editing of the fastfabric conf file The fastfabric conf file guides the overall configuration of Fast Fabri...

Страница 21: ...s its best to place all files at a given firmware level into a single directory whose name indicates the firmware revision number Once the above steps have been completed additional setup of the Chass...

Страница 22: ...ned above or IP addresses Use of names is recommended One entry per line Such as Chassis1 Chassis2 NOTE Do not list externally managed switches such as the SilverStorm 9024FC switches in this file Tho...

Страница 23: ...are running the firmware level provided and install and or reboot each chassis as needed If any chassis fails to be updated use the View ibtest result files option to review the result files from the...

Страница 24: ...les on the CD will be used to upgrade the firmware on each switch NOTE When copying files its best to place all files at a given firmware level into a single directory whose name indicates the firmwar...

Страница 25: ...th internally and externally managed switches and hence the output must be edited to leave only the SilverStorm externally managed switches saquery t sw o nodeguid 6 Switch Update Switch Firmware will...

Страница 26: ...tches listed in the ibnodes file the previous step may be repeated to review and edit the file as needed 7 Switch If any 9024FC switches were skipped above in step 5 and 6 these steps should be repeat...

Страница 27: ...and operation also select Verify rsh rcp Configured However it is instead recommended that ssh be used in which case this step can be skipped NOTE If etc hosts will be used for name resolution as opp...

Страница 28: ...powered on and booted Is host connected to management network Is host management network IP address and network settings consistent with DNS or etc hosts Is Management node connected to the management...

Страница 29: ...can be found When prompted select to do an initial installation as follows Would you like to do an upgrade install y n Would you like to do an initial install load n y NOTE An initial installation wi...

Страница 30: ...torm Technologies Inc IB Host Admin Menu 4 1 1 0 15 Fast Fabric Host List etc sysconfig iba allhosts 0 Edit Config and Select Edit Hosts Files Perform 1 Verify Hosts via Ethernet ping Perform 2 Summar...

Страница 31: ...he hosts file the previous step may be repeated to review and edit the file as needed 5 All Summary of Fabric Components will provide a brief summary of the counts of components in the fabric includin...

Страница 32: ...nge carefully examine the two hosts involved to verify that the PCI slot used BIOS settings and any motherboard jumpers related to devices on PCI buses or slot speeds Also verify HCA and riser cards a...

Страница 33: ...r copying the files edit the hosts and allhosts files such that the file on each IB Management Node omits itself from the hosts files but lists all other IB Management Nodes and specifies itself in th...

Страница 34: ...talled One of the more popular measures of overall performance is HPL This is the application used to rate systems on the Top 500 list The steps allow some initial runs of HPL to be made and provide s...

Страница 35: ...review the host configuration and stop these extra processes if possible HPL is very sensitive to swapping If a lot of swapping is seen and xhpl is dropping below 97 for long durations this may indic...

Страница 36: ...ct Edit Hosts Files Perform 1 Verify Hosts via Ethernet ping Skip 2 Verify rsh rcp Configured Skip 3 Setup Password less ssh scp Skip 4 Copy etc hosts to all hosts Skip 5 Show uname a for all hosts Sk...

Страница 37: ...InfiniServ software installed If any hosts fail to be updated use the View ibtest result files option to review the result files from the update See the section Interpreting the ibtest log files on pa...

Страница 38: ...3 Getting Started Upgrading IB software 3 24 D000006 000 Rev A Q...

Страница 39: ...re is being used on the hosts Those will be marked with Host All menu items which are applicable only when SilverStorm IB Switches or Chassis are being used will be marked with Switch All remaining me...

Страница 40: ...Previous Menu or ESC The submenus typically present operations in the typical order they would be used during an installation Pressing the keys corresponding to menu items 0 9 a e in the example above...

Страница 41: ...able In addition it will also permit review and editing of the fastfabric conf file The fastfabric conf file guides the overall configuration of Fast Fabric and describes cluster specific attributes o...

Страница 42: ...n and Select Edit Hosts Files All This will permit the hosts and fastfabric conf files to be edited The hosts file selected and created via this menu should not list the Fast Fabric host itself After...

Страница 43: ...hosts Review the results carefully to verify all the hosts have the expected OS version In typical clusters all hosts will be running the same OS and kernel version 4 1 7 Install Upgrade QuickSilver S...

Страница 44: ...lting object files to all the hosts This is in preparation for execution of MPI performance tests and benchmarks in a later step 4 1 10 Reboot Hosts Linux This will run the ibtest reboot command to re...

Страница 45: ...scpall command A file on the local host may be specified to be copied to all selected hosts 4 1 15 View ibtest result files All This permits viewing of the test log and test res files that reflect th...

Страница 46: ...mand on all hosts Skip a View ibtest result files Skip P Perform the selected actions N Select None X Return to Previous Menu or ESC 4 2 1 Edit Config and Select Edit Hosts Files All This will permit...

Страница 47: ...run iba_report i 10 o errors o slowlinks on the IB Management node This will check all the ports in the fabric for any links which have high error rates or are running at a lower speed than expected A...

Страница 48: ...to devices on PCI buses or slot speeds Also verify HCA and riser cards are properly seated 4 2 9 Generate all Hosts Problem Report Info Host This will run the captureall command to collect configurat...

Страница 49: ...selected actions N Select None X Return to Previous Menu or ESC 4 3 1 Edit the Configuration and Select Edit Chassis Files Switch This will permit the chassis and fastfabric conf files to be edited Th...

Страница 50: ...ch chassis and select it for use on next reboot run push firmwarew to each chassis select it for use and if its not the presently running firmware reboot the chassis Additional options prompted for pa...

Страница 51: ...le tgz file that can be sent to the Support Representative 4 3 7 Run a command on all chassis Switch This will run the cmdall C command A Chassis CLI command may be specified to be executed against al...

Страница 52: ...r ESC 4 4 1 3 4 1Edit Config and Select Edit Chassis Files Switch This will permit the ibnodes and fastfabric conf files to be edited The ibnodes file selected and created via this menu should not lis...

Страница 53: ...emfw files If any switches fail to be updated use the View ibtest result files option to review the result files from the update Refer to the section Interpreting the ibtest log files on page 5 68 for...

Страница 54: ...4 Fast Fabric TUI Menu SilverStorm Externally Managed IB Switch Administration via Fast Fabric 4 16 D000006 000 Rev A Q...

Страница 55: ...H Most of the tools are installed in sbin 5 1 Common Tool Options Therearesomecommonoptionstotheassortedcommandlinetools Theseoptions are applicable to most of the tools 5 1 1 Will display Usage infor...

Страница 56: ...n be ssh Consult the SilverStorm 9000 Users Guide for more information 5 1 4 C Specifies that the given operation should be performed against chassis By default Fast Fabric operations are performed ag...

Страница 57: ...ptions are considered in the following order the first item listed below that is specified is used for the given command 1 h option 2 HOSTS environment variable 3 f option 4 HOSTS_FILE environment var...

Страница 58: ...IP addresses Typically management network hostnames are specified However if desired IPoIB hostnames or IP addresses may be used This can accelerate large file transfers and other operations Files to...

Страница 59: ...variable 3 F option 4 CHASSIS_FILE environment variable 5 etc sysconfig iba chassis file For example if the H option is used and the CHASSIS_FILE environment variable is also exported the command will...

Страница 60: ...s or include directives the must be white space separated from any preceding name IP address or included file name 5 1 7 2 Explicit Chassis names When chassis are explicitly specified via the H option...

Страница 61: ...ith all relevant slots as part of that single specification This is important so that parallel operations do not cause conflicting concurrent operations against a given chassis 5 1 8 Selection of Swit...

Страница 62: ...a comment 0x00066a00d9000138 i9k138 Node GUID with desired Name 0x00066a00d9000139 i9k139 Node GUID with desired Name include etc sysconfig iba moreswitches included file Each line of the switch list...

Страница 63: ...66a00d9000139 i9k139 5 1 9 Selection of local Ports subnets Some of the fabric health commands fabric_analysis all_analysis permits a specific set of local HCA ports to be used for fabric analysis The...

Страница 64: ...In some fabrics it may be useful to create multiple files in etc sysconfig iba representing different subsets of the ports from which the user may operate For example etc sysconfig iba ports primary...

Страница 65: ...ede the comment On lines with a port or include directive the must be white space separated from any preceding port or included filename 5 1 9 2 Explicit ports When ports are explicitly specified via...

Страница 66: ...s RCP and commands rsh to be run from this host to all the other hosts and to itself via localhost as a specific user default is root Additionally this command can be used to verify rsh is setup to al...

Страница 67: ...f hostfile h hosts u user S C only perform connect to enter in local hosts knownhosts When run in this mode S and s options are ignored s use ssh scp to transfer files default is rsh rcp i ipoib_suff...

Страница 68: ...etup_ssh for initial key exchange is with the s and S options This requires all hosts have been configured with the same password for the specified user typically root In this mode the password will b...

Страница 69: ...nstallation and booting of IB software setup_ssh will need to be rerun with the C option to update the knownhosts file 5 2 4 cmdall LinuxandSwitch ExecutesacommandonallhostsorSilverStormIBchassis This...

Страница 70: ...or example when running host commands such as rm the i option interactively prompt before removal should not be used Note that this option is sometimes part of a standard bash alias list Similarly whe...

Страница 71: ...capture of d upload_dir directory to upload to default is uploads If not specified the environment variable UPLOADS_DIR will be used If that is not exported the default uploads will be used S securel...

Страница 72: ...sis then creates mycapture all tgz captureall C H chassis1 chassis2 030127capture Environment Variables The following environment variables are also used by this command HOSTS HOSTS_FILE see discussio...

Страница 73: ...Pandrequirethatpassword lessSSH SCP be setup between the host running Fast Fabric and the hosts files that are being transferred to and from The setup_ssh Fast Fabric tool can aid in setting up passwo...

Страница 74: ...ot specified it defaults to the present directory name If both the source and destination directory names are omitted they both default to the current directory name Example copy a single file scpall...

Страница 75: ...in cluster default is etc sysconfig iba hosts h hosts list of hosts to upload from u user user to perform copy to default is current user code d upload_dir directory to upload to default is uploads If...

Страница 76: ...ot be used in the arguments to uploadall To copy files from this host to hosts in the cluster use scpall or downloadall Environment Variables The following environment variables are also used by this...

Страница 77: ...of the file or directory on the destination hosts to copy to If more than one source file is specified dest_file will be treated as a directory name The given directory must already exist on the desti...

Страница 78: ...le was changed for some or all of the hosts it can then be downloaded to all the hosts downloadall d uploads ifcfg ib1 etc sysconfig network scripts ifcfg ib1 Alternatively if there was no need to dow...

Страница 79: ...umber of Switch Chips 6 Number of Links 29 Number of 1x Ports 2 The output is as follows SM each subnet manger SM running in the fabric is listed along with its node name port GUID and present SM stat...

Страница 80: ...tches NOTE iba_report is a newer and more powerful Fast Fabric command For general fabric analysis use iba_report with options such as o errors and or o slowlinks to perform a more efficient analysis...

Страница 81: ...his command HOSTS HOSTS_FILE see discussion on selection of hosts above CHASSIS CHASSIS_FILE see discussion on selection of chassis above IBNODES IBNODES_FILE see discussion on selection of switches a...

Страница 82: ...ounters IBTA mandatory counters Also any end nodes which report support of a IBTA device management agent must implement the IOU Info IOC Profile and Service Entry queries as outlined in the IBTA 1 1...

Страница 83: ...il 0 n for output default is 2 P persist only include data persistent across reboots H hard only include permanent hardware data N noname omit node and IOC names x xml output in XML s stats get perfor...

Страница 84: ...summary of links configured to run slower than supported includes slowlinks slowconnlinks summary of links connected with mismatched speed potential includes slowconfiglinks misconfiglinks summary of...

Страница 85: ...OC Profile ID String IOC Name iocpat value1 port value2 value1 is global pattern for IOC Profile ID String IOC Name value2 is port ioctype value value is IOC type VNIC or SRP ioctype value1 port value...

Страница 86: ...a portguid iba_report o brnodes F portguid 0x00066a00a0000380 Find all the connections to a server iba_report o links F node duster Find all the connections to a switch chip iba_report o links F node...

Страница 87: ...clear them then recheck iba_report o errors C sleep 10 iba_report o errors Clear all port counters wait 10 seconds then check Iba_report i 10 o errors Check all port counters on a server iba_report o...

Страница 88: ...00b 0x00066a00a00001b8 4x 2 5Gb 0x00066a0098000380 CA goblin 1 0x000a 0x00066a00a0000380 4x 2 5Gb 0x00066a0098000384 CA cuda 1 0x0005 0x00066a00a0000384 1x 2 5Gb 2 0x0006 0x00066a01a0000384 4x 2 5Gb 0...

Страница 89: ...0066a00280002cd SW InfiniCon Systems InfiniFabric Sw A Dev A 0 0x0013 0x00066a00280002cd Noop Noop 3 4x 2 5Gb 5 4x 2 5Gb 0x00066a00d8000123 SW InfiniCon Systems InfinIO9024 0 0x0001 0x00066a00d8000123...

Страница 90: ...2 5Gb 4 4x 2 5Gb 1 Connected SMs in Fabric State GUID Name Master 0x00066a00d8000123 InfiniCon Systems InfinIO9024 Each iba_report allows for various levels of detail Increasing detail is shown as fur...

Страница 91: ...get a report with a little more detail root duster root iba_report d 1 Node Type Brief Summary 14 Connected CAs in Fabric NodeGUID Type Name 0x0002c9020020e0d4 CA coyote1 0x00066a00580001e0 CA VEx in...

Страница 92: ...ID and will therefore be properly grouped However some third party devices do not implement the system image GUID and may report a value of 0 In such a case iba_report will treat each component as an...

Страница 93: ...ver it does not include links which are running slower than expected misconnlinks this is similar to slowconnlinks in that it reports links which have been connected between ports of different speed p...

Страница 94: ...further limit the report to only include hardware information This is a superset of P and omits more information A related but independent option is N This will omit all the node and IOC names from th...

Страница 95: ...formance or error situation which is being reported between 2 specific points in the fabric Such as a StatusTimeoutRetry that MPI may be reporting between 2 processes in its run Focus can use glob sty...

Страница 96: ...ssis 0x00066A005000010C Slot 2 IOC 1 iba_report o nodes F ioc Chassis 0x00066A005000010C Slot 2 IOC 1 port 2 iba_report o nodes F iocpat Slot 2 iba_report o nodes F iocpat Slot 2 port 2 iba_report o n...

Страница 97: ...the initial design bin bash specify some filenames to use expected_config usr local report master master copy of config previously created config tmp report where we will generate new report diffs tmp...

Страница 98: ...r than supported Rate NodeGUID Port Type Name Enabled Supported 2 5g 0x00066a0098000384 1 CA cuda 1x 2 5Gb 1 4x 2 5Gb 0x00066a00d8000123 2 SW InfiniCon Systems InfinIO9024 1 4x 2 5Gb 1 4x 2 5Gb 20 of...

Страница 99: ...shold 3 10g 0x00066a0098000380 1 CA goblin SymbolErrorCounter 65535 Exceeds Threshold 100 LinkErrorRecoveryCounter 255 Exceeds Threshold 3 PortRcvErrors 65535 Exceeds Threshold 100 0x00066a00d8000123...

Страница 100: ...ports 0x00066a00980001b8 1 CA orc and 0x00066a0098000001 1 CA julio 1 Paths SGID 0xfe80000000000000 00066a00a00001b8 DGID 0xfe80000000000000 00066a00a0000001 SLID 0x000b DLID 0x000c Reversible Y PKey...

Страница 101: ...Links Checked 0 Errors found Links with errors threshold Summary Focused on 4 Ports 1 0x00066a00a00001b8 in Node 0x00066a00980001b8 CA orc 10 in Node 0x00066a00d8000123 SW InfiniCon Systems InfinIO90...

Страница 102: ...s M_Key 0 P_Key 0 Q_Key 0 ErrorLimits Overrun 15 LocalPhys 15 DiagCode 0x0000 P_Key Enforcement In Off Out Off FilterRaw In Off Out Off Performance Transmit Xmit Data 16383 MB 4294967295 Quads Xmit Pk...

Страница 103: ...Y 0x0000000000000000 Lease 0 s Protect Readonly MTU Active 2048 Supported 2048 VL Stall 0 LinkWidth Active 4x Supported 1 4x Enabled 1 4x LinkSpeed Active 2 5Gb Supported 2 5Gb Enabled 2 5Gb VLs Activ...

Страница 104: ...Enabled 1 4x LinkSpeed Active 2 5Gb Supported 2 5Gb Enabled 2 5Gb VLs Active 1 1 Supported 4 1 HOQLife 4096 ns Capability 0x02090048 CR DM CM SL Trap Violations M_Key 0 P_Key 0 Q_Key 0 ErrorLimits Ove...

Страница 105: ...49 2 CA rockaway 0x00066a00d8000123 3 SW InfiniCon Systems InfinIO9024 10g 0x00066a0098002813 1 CA brady 0x00066a00d8000123 19 SW InfiniCon Systems InfinIO9024 10g 0x00066a0098002813 2 CA brady 0x0006...

Страница 106: ...in Node 0x00066a0098000384 CA cuda 13 Connected CAs in Fabric Name cuda NodeGUID 0x00066a0098000384 Type CA Ports 2 PartitionCap 64 SystemImageGuid 0x00066a0098000384 BaseVer 1 SmaVer 1 VendorID 0x66a...

Страница 107: ...ts PortNum 1 LID 0x0015 GUID 0x00066a00a00003a6 Neighbor 0x00066a00d8000123 9 SW InfiniCon Systems InfinIO9024 Width 4x Speed 2 5Gb PortNum 2 LID 0x0016 GUID 0x00066a01a00003a6 Neighbor 0x00066a00d800...

Страница 108: ...Gb Supported 2 5Gb Enabled 2 5Gb VLs Active 4 1 Supported 4 1 HOQLife 4096 ns Capability 0x02010048 CR CM SL Trap Violations M_Key xxxxx P_Key xxxxx Q_Key xxxxx ErrorLimits Overrun 15 LocalPhys 15 Dia...

Страница 109: ...LinkWidth Active 4x Supported 1 4x Enabled 1 4x LinkSpeed Active 2 5Gb Supported 2 5Gb Enabled 2 5Gb VLs Active 4 1 Supported 4 1 HOQLife 4096 ns Capability 0x02010048 CR CM SL Trap Violations M_Key x...

Страница 110: ...he IB management node is connected to more than one fabric i e a subnet the HCA and port may be specified to select the fabric whose SA is to be queried Usage saquery v h hca p port o type l lid t typ...

Страница 111: ...ortguid list of port guids lid list of lids gid list of gids desc list of node descriptions names path list of path records node list of node records portinfo list of port info records sminfo list of...

Страница 112: ...portinfo sminfo swinfo link slvl vlarb pkey guids service mcmember inform linfdb ranfdb mcfdb trace l lid systemguid nodeguid portguid lid desc path node portinfo swinfo slvl vlarb pkey guids service...

Страница 113: ...mcfdb trace P port_guid_pair path trace systemguid nodeguid portguid lid gid desc node portinfo sminfo swinfo link slvl vlarb pkey guids service mcmember inform linfdb ranfdb mcfdb G gid_pair path tra...

Страница 114: ...assis default is hosts n perform operation against IB node default is hosts i ipoib_suffix suffix to apply to host names to create ipoib host names The default is ib f hostfile file with hosts in clus...

Страница 115: ...firmware is in primary and running The default is push S securely prompt for password for user on remote system chassis test test to run Host Test can be one or more of load initial install of all hos...

Страница 116: ...t is executing Test log will contain detailed information about what was performed This will include the specific commands executed and the resulting output The test_tmp directories will contain tempo...

Страница 117: ...g host load Used in absence of I option FF_UPGRADE_OPTIONS upgrade options for host IB software INSTALL during host upgrade Used in absence of U option FF_PACKAGES host packages to load during host lo...

Страница 118: ...working directory and will be copied to all the end nodes and installed NOTE Only those Fabric Access components that are currenly installed will be upgraded This operation will fail for nodes that d...

Страница 119: ...2 3 4 5 6 This can be used to verify switch latency hops PCI bandwidth and overall MPI performance The test res file will have the results of each pair of nodes tested To obtain accurate results this...

Страница 120: ...keys to be configured within the chassis for secure password less login In this case there is no need to configure a FF_CHASSIS_ADMIN_PASSWORD and FF_CHASSIS_LOGIN_METHOD can be SSH Refer to the Silve...

Страница 121: ...lished In most cases a parallel upgrade is recommended for expediency 5 5 3 2 upgrade Upgrades the firmware on each specified switch The P option selects a directory containing a emfw file or provides...

Страница 122: ...om that host and or chassis For example test log may contain lines such as scp InfiniServPerf 4 1 1 0 15 tgz root n001a TEST CASE FAILURE scp InfiniServPerf 4 1 1 0 15 tgz root n001a failed ssh n001a...

Страница 123: ...mand is recommended as the primary tool for general analysis When its desired to restrict the analysis to a specific subset of components use one of the commands below fabric_analysis performs fabric...

Страница 124: ...ors reported in the files indicated by the tools Once all the errors are corrected perform a baseline of the configuration using the b option The baseline configuration will be saved to files in FF_AN...

Страница 125: ...diff would be from the new snapshot Another command which can be useful is the Linux sdiff command For more information about the diff output format consult the Linux man page for diff If the configur...

Страница 126: ...analysis b e s d dir c file t portsfile p ports b baseline mode default is compare check mode e evaluate health only default is compare check mode s save history of failures errors differences d dir t...

Страница 127: ...l be used to analyze the fabric However in more complex fabrics the Fast Fabric host may be connected to more than one fabric e g an IB subnet In this case the specific ports and or HCAs to use for fa...

Страница 128: ...ed during fabric error analysis latest fabric 0 0 errors stderr stderr of iba_report during fabric error analysis Baseline baseline fabric 0 0 comps iba_report summary of fabric components and basic S...

Страница 129: ...in fabric replacement of HCA or IB Switch hardware Adding Removing IB Nodes CA Virtual CAs Virtual Switches Physical Switches Physical Switch internal switching cards leaf spine Changes to server or...

Страница 130: ...bles or bad ports or poor connections Side effect is the verification of SA health 5 6 4 chassis_analysis Switch The chassis_analysis command has the following usage chassis_analysis b e s d dir F cha...

Страница 131: ...howInventory fwVersion showIBNodeDesc ismShowPStatThresh ismChassisSet12x timeZoneConf timeDSTConf snmpCommunityConf snmpTargetAddr showChassisIpAddr showDefaultRoute The commands specified in FF_CHAS...

Страница 132: ...selected chassis baseline chassis showChassisIpAddr the output of the showChassisIpAddr command for all selected chassis baseline chassis showDefaultRoute the output of the showDefaultRoute command fo...

Страница 133: ...of baseline and latest showChassisIpAddr latest chassis showDefaultRoute the output of the showDefaultRoute command for all selected chassis latest chassis showDefaultRoute diff the diff of the basel...

Страница 134: ...baseline Based upon showInventory addition removal of Chassis FRUs Replacement is only checked for FRUs that showInventory displays the serial number For the 9000 series the fan and power supply repla...

Страница 135: ...of Fans in chassis Status of Power Supplies in chassis Temp Voltage for each card Presence of adequate power cooling of FRUs Presence of N 1 power cooling of FRUs Presence of Redundant AC input 5 6 5...

Страница 136: ...nfigured via the FF_CHASSIS_HEALTH and FF_CHASSIS_CMDS parameters Health Check latest hostsm smstatus the output of the sm_query smShowStatus command Baseline baseline hostsm smver host SM version bas...

Страница 137: ...istoryoffailedchecks The default is var opt iba analysis G esmchassisfile the file with SM chassis within the cluster The default is etc sysconfig iba esm_chassis E esmchassis the list of the SM chass...

Страница 138: ...list The esm_analysis variable performs analysis against one or more chassis in the fabric As such it permits a chassis to be specified via the E G ESM_CHASSIS ESM_CHASSIS_FILE or fastfabric conf The...

Страница 139: ...rms The diff files are only created if differences are detected If the s option is used and failures are detected files related to the checks that have failed are also copied to a time stamped directo...

Страница 140: ...config iba ports p ports a list of local HCA ports used to access fabric s for analysis The default is the first active port This is specified as hca port 0 0 1st active port in system 0 y port y with...

Страница 141: ...PORTS_FILE a file containing a list of ports used in absence of t and p FF_TIMEOUT_MULT multiplier for response timeouts The default is 2 This typically does not need to be set but in the event of un...

Страница 142: ...lysis tool to the appropriate administrators for further analysis and corrective action as needed NOTE Running these tools too often can have negative impacts Among the potential risks Each run adds a...

Страница 143: ...natives to full include quick builds just OSU Pallas and HPL all builds just OSU Pallas HPL and NAS benchmarks In order to run the applications an mpi_hosts file must be created in opt iba src mpi_app...

Страница 144: ...ecifying the maximum message size can be provided This benchmark will only use the first two nodes listed in mpi_hosts During this benchmark the opt iba src mpi_apps mpi param pallas config file is us...

Страница 145: ...param pallas config file is used 6 5 OSU Bidirectional Bandwidth This is a simple benchmark of maximum bidirectional bandwidth A script is provided to run this application that will execute an assortm...

Страница 146: ...m sizes m a medium problem size l a large problem size These can be selected using config_hpl The following command displays the preconfigured problem sizes available config_hpl For example to quickly...

Страница 147: ...luster approximately 256 processors or greater it is rather large at 2 5GB As such it is recommended that Pallas be used for smaller runs 2 32 processes or that it be recognized that the benchmark is...

Страница 148: ...6 MPI Sample Applications Pallas 6 6 D000006 000 Rev A Q...

Страница 149: ...following options a For root user command prompt ends in or NOTE There must be a space after or b Tcl and Expect packages installed on all IB Management Nodes 4 Remote login as root enabled a If using...

Страница 150: ...uration completed Such as a configuration of NTP b configuration of timezone c configuration of a syslog server A 3 Installing and Configuring the Subnet Manager 1 Subnet Manager installed enabled 2 S...

Страница 151: ...n the Remaining Servers 1 Fastfabric conf file reviewed 2 etc sysconfig iba allhosts file created listing all hosts including IB management nodes 3 Verify hosts via Ethernet ping 4 Summary of fabric c...

Страница 152: ...each IB management node from which the health check tools will be used 1 Edit fastfabric conf and review the health check tools parameters 2 If using embedded SM s create etc sysconfig iba esm_chassi...

Страница 153: ...c tools For a given release consult etc sysconfig fastfabric conf sample for a sample file with the defaults of the given release If fastfabric conf does not assign a value to a given configuration va...

Страница 154: ...ysconfig then CONFIG_DIR etc sysconfig else CONFIG_DIR etc fi export CONFIG_DIR fi Override default location for HOSTS_FILE export HOSTS_FILE HOSTS_FILE CONFIG_DIR iba hosts Override default location...

Страница 155: ...ost_basename_to_ipoib 1 hostname provided echo 1 FF_IPOIB_SUFFIX fi shell Function to convert a hostname into a basic hostname eg remove IPoIB suffix etc should match result of hostname s on host if F...

Страница 156: ..._UPGRADE_OPTIONS FF_UPGRADE_OPTIONS where to upload server specific files to during uploadall captureall d option export UPLOADS_DIR UPLOADS_DIR uploads where to download server specific files from du...

Страница 157: ...ib1 ib2 On OFED stack it will be ib0 ib1 export FF_IPOIB_BASE_DEV_NUM FF_IPOIB_BASE_DEV_NUM 1 shell Function to return the base IPoIB device number for this stack type For Silverstorm stack installat...

Страница 158: ...export FF_ESM_CMDS FF_ESM_CMDS smShowSMParms smShowDefBcGroup list of analysis to perform during all_analysis pick appropriate type of SM to analyze export FF_ALL_ANALYSIS FF_ALL_ANALYSIS fabric chass...

Страница 159: ...B Fast Fabric Configuration Files D000006 000 Rev A B 7 Q NOTE Do not edit etc sysconfig iba iba_mon conf sample...

Страница 160: ...ied will be cleared by iba_mon and may impact any remote Performance Managers which are monitoring the given Counter Interval 10 monitoring interval in seconds SyslogFacility local6 syslog facility co...

Страница 161: ...names are generally easier to translate than numeric IP addresses Typically management network host names are specified However if desired IPoIB hostnames or IP addresses may be used This can accelera...

Страница 162: ...operations are performed against the management card in the chassis Foroperations such as cmdall the commandisexecutedagainstthe management interface for the given chassis For more sophisticated opera...

Страница 163: ...Node GUID with desired Name include etc sysconfig iba moreswitches included file Each line of the switch list file may specify a single switch a comment or another switch list file to include Switche...

Страница 164: ...e command line Refer to the section Selection of local Ports subnets on page 5 9 for more information Below is a sample port list file this is a comment 1 1 first port on 1st HCA 1 2 second port on 1s...

Страница 165: ...on Files D000006 000 Rev A B 13 Q Comments may be placed on any line By using a to precede the comment On lines with a port or include directive the must be white space separated from any preceding po...

Страница 166: ...B Fast Fabric Configuration Files Port List Files B 14 D000006 000 Rev A Q...

Страница 167: ...rmed by adding a ib suffix to the management network name If a different suffix is desired FF_IPOIB_SUFFIX can be changed If IPoIB is also being used as the management network FF_IPOIB_SUFFIX can be s...

Страница 168: ...C Configuration of IPoIB Name Mapping C 2 D000006 000 Rev A Q...

Страница 169: ...sibly SM nodes and no servers are installed in more than one subnet consult the instructions below for Primarily Independent Subnets 3 The subnets are overlapping If multiple IB components are common...

Страница 170: ...configure a host SM node to manage more than one IB subnet Installing and Verifying Firmware on the IB Switches on page 3 10 At this time this operation is not supported for IB management nodes conne...

Страница 171: ...e same set of subnets the files copied to each management node may need to be slightly different For example configuration files for fabric_analysis may indicate different port numbers or host files u...

Страница 172: ...can be performed as per the instructions When creating the chassis file list all SilverStorm 9000 series internally managed IB switches in all subnets If desired additional files may also be created...

Страница 173: ...nux Refresh SSH Known Hosts on page 4 9 may be run per the instructions Host Check MPI Performance on page 4 10 can be run for each subnet by using the allhosts files specific to each subnet i e those...

Страница 174: ...all subnets are checked Similarly the esm_chasssis and chassis files used should list all relevant SilverStorm IB chassis in all subnets Running HPL on page 3 20 can be run for each subnet by creatin...

Отзывы: