background image

3

InfiniBand

®

 Cluster Setup and Administration

Performance Settings and Management Tips

IB0054606-02  A

3-35

The values picked for the various checks and tests may depend on the type of 
node being configured. The tool is aware of two types of nodes

compute and 

storage nodes.

Compute Nodes

Compute nodes are nodes which should be optimized for faster computation and 
communication with other compute nodes.

Storage (Client or Server) Nodes

Storage nodes are nodes which serve as clients or servers in a parallel filesystem 
network. Storage nodes (especially clients) are typically performing computation 
and using MPI, in addition to sending and receiving storage network traffic. The 
objective is to improve IB verbs communications while maintaining good MPI 
performance.

OPTIONS

Table 3-4

 list the options for the 

ipath_perf_tuning 

tool and describes each 

option.

cstates

Check whether (and which) C-States are enabled. C-States 
should be turned off for best performance. 

services

Check whether certain system services (daemons) are 
enabled. These services should be turned off for best perfor-
mance.

Table 3-4. 

ipath_perf_tuning 

Tool Options

Option

Description

-h 

Display a short multi-line help message

-T test

 This option is used to limit the list of tests/check which the tool per-
forms to only those specified by the option. Multiple tests can be speci-
fied as a comma-separated list.

-I 

Run the tool in interactive mode. In this mode, the tool will prompt the 
user for input on certain tests.

Table 3-3. Checks Preformed by 

ipath_perf_tuning

 Tool 

Check Type

Description

Summary of Contents for OFED+ Host

Page 1: ...IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...

Page 2: ...orporation reserves the right to change product specifications at any time without notice Applications described in this document for any of these products are for illustrative purposes only QLogic Co...

Page 3: ...PI Usage Checklists Cluster Setup 2 1 Using MPI 2 2 3 InfiniBand Cluster Setup and Administration Introduction 3 1 Installed Layout 3 2 IB and OpenFabrics Driver Overview 3 3 IPoIB Network Interface C...

Page 4: ...igure the ib_qib Driver State 3 22 Start Stop or Restart ib_qib Driver 3 22 Unload the Driver Modules Manually 3 23 ib_qib Driver Filesystem 3 23 More Information on Configuring and Loading Drivers 3...

Page 5: ...ations 4 3 Further Information on Open MPI 4 4 Configuring MPI Programs for Open MPI 4 5 To Use Another Compiler 4 5 Compiler and Linker Variables 4 7 Process Allocation 4 7 IB Hardware Contexts on th...

Page 6: ...on MVAPICH2 5 5 Managing MVAPICH and MVAPICH2 with the mpi selector Utility 5 5 Platform MPI 8 5 6 Installation 5 6 Setup 5 6 Compiling Platform MPI 8 Applications 5 7 Running Platform MPI 8 Applicati...

Page 7: ...6 13 Environment Variables 6 13 Implementation Behavior 6 15 Application Programming Interface 6 17 SHMEM Benchmark Programs 6 27 7 Virtual Fabric support in PSM Introduction 7 1 Virtual Fabric Suppor...

Page 8: ...SRP Target Port of a Session by IOCGUID B 10 Specifying a SRP Target Port of a Session by Profile String B 10 Specifying an Adapter B 10 Restarting the SRP Module B 11 Configuring an Adapter with Mult...

Page 9: ...Failure D 5 MPI Job Failures Due to Initialization Problems D 6 OpenFabrics and InfiniPath Issues D 6 Stop Infinipath Services Before Stopping Restarting InfiniPath D 6 Manual Shutdown or Restart May...

Page 10: ...hardware and the Ethernet network E 7 Troubleshooting SRP Issues E 9 ib_qlgc_srp_stats showing session in disconnected state E 9 Session in Connection Rejected state E 11 Attempts to read or write to...

Page 11: ...mod G 30 modprobe G 30 mpirun G 31 mpi_stress G 31 rpm G 32 strings G 32 Common Tasks and Commands G 32 Summary and Descriptions of Useful Files G 34 boardversion G 34 status_str G 35 version G 36 Sum...

Page 12: ...3 Distributed SA Multiple Virtual Fabrics Example 3 14 3 4 Distributed SA Multiple Virtual Fabrics Configured Example 3 15 3 5 Virtual Fabrics with Overlapping Definitions 3 15 3 6 Virtual Fabrics wi...

Page 13: ...APICH Wrapper Scripts 5 3 5 3 MVAPICH Wrapper Scripts 5 4 5 4 Platform MPI 8 Wrapper Scripts 5 7 5 5 Intel MPI Wrapper Scripts 5 10 6 1 SHMEM Run Time Library Environment Variables 6 13 6 2 shmemrun E...

Page 14: ...xiv IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...

Page 15: ...nce This guide is intended for end users responsible for administration of a cluster network as well as for end users who want to use that cluster This guide assumes that all users are familiar with c...

Page 16: ...or command line text For example To return to the root directory from anywhere in the file structure Type cd root and press ENTER Enter the following command sh install bin Key names and key strokes...

Page 17: ...ication on the left The QLogic Global Training portal offers online courses certification exams and scheduling of in person training Technical Certification courses include installation maintenance an...

Page 18: ...s an extensive collection of QLogic product information that you can search for specific solutions We are constantly adding to the collection of information in our database to provide answers to your...

Page 19: ...ples for compiling and running MPI programs with other MPI implementations Section 7 describes QLogic Performance Scaled Messaging PSM that provides support for full Virtual Fabric vFabric integration...

Page 20: ...dware installation and the QLogic InfiniBand Fabric Software Installation Guide contains information on QLogic software installation Overview The material in this documentation pertains to a QLogic OF...

Page 21: ...teroperability QLogic OFED participates in the standard IB subnet management protocols for configuration and monitoring Note that QLogic OFED including Internet Protocol over InfiniBand IPoIB is inter...

Page 22: ...1 Introduction Interoperability 1 4 IB0054606 02 A...

Page 23: ...gement problems the compute nodes of the cluster must have very similar hardware configurations and identical software installations See Homogeneous Nodes on page 3 37 for more information 2 Check tha...

Page 24: ...en MPI Applications on page 4 2 4 Create an mpihosts file that lists the nodes where your programs will run See Create the mpihosts File on page 4 3 5 Run Open MPI applications See Running Open MPI Ap...

Page 25: ...are This software provides the foundation that supports the MPI implementation Figure 3 1 illustrates these relationships Note that HP MPI Platform MPI Intel MPI MVAPICH MVAPICH2 and Open MPI can run...

Page 26: ...bin opt iba Documentation is found in usr share man usr share doc infinipath License information is found only in usr share doc infinipath QLogic OFED Host Software user documentation can be found on...

Page 27: ...SRP devices on the fabric have been discovered MPI over uDAPL can be used by Intel MPI IPoIB must be configured before MPI over uDAPL can be set up Other optional drivers can now be configured and en...

Page 28: ...RX packets 0 errors 0 dropped 0 overruns 0 frame 0 TX packets 0 errors 0 dropped 0 overruns 0 carrier 0 collisions 0 txqueuelen 128 RX bytes 0 0 0 b TX bytes 0 0 0 b 3 Type ping c 2 b 10 1 17 255 The...

Page 29: ...QLogic recommends using the QLogic IFS Installer TUI FastFabric or iba_config command to configure the boot time and autostart of the IPoIB driver Refer to the QLogic InfiniBand Fabric Software Instal...

Page 30: ...Logic supports bonding across HCA ports and bonding port 1 and port 2 on the same HCA Interface Configuration Scripts Create interface configuration scripts for the ibX and bondX interfaces Once the c...

Page 31: ...0 downdelay 0 The following is an example for ib0 slave The file is named etc sysconfig network scripts ifcfg ib0 DEVICE ib0 USERCTL no ONBOOT yes MASTER bond0 SLAVE yes BOOTPROTO none TYPE InfiniBan...

Page 32: ...boot BONDING_MASTER yes BONDING_MODULE_OPTS mode active backup miimon 100 primary ib0 updelay 0 downdelay 0 BONDING_SLAVE0 ib0 BONDING_SLAVE1 ib1 MTU 65520 The following is an example for ib0 slave Th...

Page 33: ...ify that IB bonding is configured cat proc net bonding bond0 ifconfig Example of cat proc net bonding bond0 output cat proc net bonding bond0 Ethernet Channel Bonding Driver v3 2 3 December 6 2007 Bon...

Page 34: ...FE 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 UP BROADCAST RUNNING SLAVE MULTICAST MTU 65520 Metric 1 RX packets 118938033 errors 0 dropped 0 overruns 0 frame 0 TX packets 118938027 errors 0 dropped...

Page 35: ...node that acts as the subnet manager Toenable OpenSM the iba_config command can be used or the chkconfig command as a root user can be used on the node where it will be run The chkconfig command to e...

Page 36: ...n to correctly build a path record between two nodes The Distributed Subnet Administration SA solves this problem by allowing each node to locally replicate the path records needed to reach the other...

Page 37: ...do not match SIDs in the Distributed SA s database will be ignored Configuring the Distributed SA In order to absolutely minimize the number of queries made by the Distributed SA it is important to co...

Page 38: ...to limit how much IB bandwidth MPI applications are permitted to consume In that case they may re configure the QLogic Fabric Manager turning off the Default Virtual Fabric and replacing it with seve...

Page 39: ...l Fabric As a result the Distributed SA sees two different virtual fabrics that match its configuration file In Figure 3 6 the person administering the fabric has created two different Virtual Fabrics...

Page 40: ...a last resort Stored SIDs are only mapped to the default virtual fabric if they do not match any other Virtual Fabrics Thus in the first example Figure 3 6 the Distributed SA will assign all the SIDs...

Page 41: ...rtual Fabrics with Unique Numeric Indexes In Figure 3 8 the Distributed SA assigns all overlapping SIDs to the PSM_MPI fabric because it has the lowest Index Distributed SA Configuration File The Dist...

Page 42: ...0x0a3 SID 0x1a1 SID 0x1a2 SID 0x1a3 SID 0x2a1 SID 0x2a2 SID 0x2a3 ScanFrequency Periodically the Distributed SA will completely re synchronize its database This also occurs if the Fabric Manager is re...

Page 43: ...Errors Errors will be reported but nothing else Includes Dbg 1 and Dbg 2 Dbg 4 Warnings Errors and warnings will be reported Includes Dbg 3 Dbg 5 Normal Some normal events will be reported along with...

Page 44: ...need to change any file on the hosts To ensure that the driver on this host uses 2K MTU add the following options line as a root user in to the configuration file options ib_qib ibmtu 4 Table 3 1 show...

Page 45: ...TU_firmware emfw for the 9024 EM This has the 4K MTU default for use on fabrics where 4K MTU is required If 4K MTU support is not required then use the 4 2 2 0 2 DDR emfw file for DDR externally manag...

Page 46: ...e release change driver options or do manual testing QLogic recommends using etc init d openibd to stop stat and restart the ib_qib driver For using the command line to stop start and restart as a roo...

Page 47: ...ma_ iw_ xargs modprobe r ib_qib Driver Filesystem The ib_qib driver supplies a filesystem for exporting certain binary statistics to user applications By default this filesystem is mounted in the ipat...

Page 48: ...The flash file is an interface for internal diagnostic commands The file counter_names provides the names associated with each of the counters in the binary port counters files and the file driver_st...

Page 49: ...vices using the information provided in the following sections Systems in General With Either Intel or AMD CPUs For best performance on dual port HCAs on which only one port is active the module param...

Page 50: ...are not necessary On all systems the qib driver behaves as if the following parameters were set rcvhdrcnt 4096 If you run a script such as the following for x in sys module ib_qib parameters do echo...

Page 51: ...ore node then 13 is more than enough PSM contexts to run an MPI process on each core without making use of context sharing An example ib_qib options line in the modprobe conf file for this 12 core nod...

Page 52: ...bytes AMD Interlagos CPU Systems With AMD Interlagos Opteron 6200 Series CPU systems better performance will be obtained if on single HCA systems the HCA is put in a PCIe slot closest to Socket number...

Page 53: ...utput will read MaxPayload 256 bytes MaxReadReq 4096 bytes If you run a script such as the following for x in sys module ib_qib parameters do echo basename x cat x done Then in the list of qib paramet...

Page 54: ...for unknown reason 3d on CPU 0 After this happens you may also see the following message in the syslog Mth dd hh mm ss st2019 kernel ib_qib 0000 0a 00 0 infinipath0 Fatal Hardware Error no longer usab...

Page 55: ...th the new syntax are listed below Per unit parameters singleport Use only IB port 1 more per port buffer space cfgctxts Set max number of contexts to use pcie_caps Max PCIe tuning MaxPayload MaxReadR...

Page 56: ...nit 0 and 16 on unit 1 cfgctxts 0 10 1 16 A user can identify HCAs and correlate them to system unit numbers by using the b option beacon mode option to the ipath_control script Issuing the following...

Page 57: ...s this feature with the driver allocating memory on the NUMA node closest to the HCA recv_queue_size Tuning Related to NAKs The Receiver Not Ready Negative Acknowledgement RNR NAKs can slow IPoIB down...

Page 58: ...will prompt the user for input on some of the settings and actions Table 3 3 list the checks the tool performs on the system on which it is run Table 3 3 Checks Preformed by ipath_perf_tuning Tool Che...

Page 59: ...objective is to improve IB verbs communications while maintaining good MPI performance OPTIONS Table 3 4 list the options for the ipath_perf_tuning tool and describes each option cstates Check whether...

Page 60: ...ese test do not provide a guaranteed universal performance gain and therefore changing driver parameters associated with them requires user approval Other tests where the tool can make a safe determin...

Page 61: ...r etc modprobe d ib_qib conf RHEL prior to 6 0 etc modprobe conf SLES etc modprobe conf local Homogeneous Nodes To minimize management problems the compute nodes of the cluster should have very simila...

Page 62: ...mechanism See Appendix F Write Combining for more information Check the PCIe bus width If slots have a smaller electrical width than mechanical width lower than expected performance may occur Use thi...

Page 63: ...y running on a general Linux computer Following are several groups constituting a minimal necessary set of services These are all services controlled by chkconfig To see the list of services that are...

Page 64: ...keys must be distributed and stored on all the compute nodes so that connections to the remote machines can be established without supplying a password You or your administrator must set up the ssh k...

Page 65: ...p fe Root or superuser access is required on ip fe and on each node to configure ssh ssh including the host s key has already been configured on the system ip fe See the sshd and ssh keygen man pages...

Page 66: ...bGPcrVlSjuVps fWEju64FTqKEetA8l8QEgAAAIBNtPDDwdmXRvDyc0gvAm6lPOIsRLmgmdgKXT GOZUZ0zwxSL7GP1nEyFk9wAxCrXv3xPKxQaezQKs KL95FouJvJ4qrSxxHdd1 NYNR0DavEBVQgCaspgWvWQ8cL 0aUQmTbggLrtD9zETVU5PCgRlQL6I3Y5sCCH...

Page 67: ...en t rsa 2 Enter a passphrase for your key pair when prompted Note that the key agent does not survive X11 logout or system reboot ssh add 3 The following command tells ssh that your key pair should l...

Page 68: ...described in the following paragraph MPI jobs that use more than 10 processes per node may encounter an ssh throttling mechanism that limits the amount of concurrent per node connections to 10 If you...

Page 69: ...orrectly See iba_opp_query on page G 4 for detailed usage information iba_opp_query slid 0x31 dlid 0x75 sid 0x107 Query Parameters resv1 0x0000000000000107 dgid sgid dlid 0x75 slid 0x31 hop 0x0 flow 0...

Page 70: ...x0 resv2 0x0 resv3 0x0 ibstatus Another useful program is ibstatus that reports on the status of the local HCAs Sample usage and output are as follows ibstatus Infiniband device qib0 port 1 status def...

Page 71: ...4096 5 active_mtu 4096 5 sm_lid 1 port_lid 31 port_lmc 0x00 ipath_checkout ipath_checkout is a bash script that verifies that the installation is correct and that all the nodes of the network are func...

Page 72: ...3 InfiniBand Cluster Setup and Administration Checking Cluster and Software Status 3 48 IB0054606 02 A...

Page 73: ...and MVAPICH2 version 1 7 These MPIs are offered in versions built with the high performance Performance Scaled Messaging PSM interface and versions built run over IB Verbs There are also the commercia...

Page 74: ...installed Setup When using the mpi selector tool the necessary PATH and LD_LIBRARY_PATH setup is done When not using the mpi selector tool put the Open MPI installation directory in the PATH by adding...

Page 75: ...ains the host names of the nodes in your cluster that run the examples with one host name per line Name this file mpihosts The contents can be in the following format More details on the mpihosts file...

Page 76: ...munication to self mca btl openib self The following command disables PSM transport mca mtl psm In these commands btl stands for byte transport layer and mtl for matching transport layer PSM transport...

Page 77: ...dit a Makefile to achieve this result adding lines similar to CC mpicc F77 mpif77 F90 mpif90 CXX mpicxx In some cases the configuration process may specify the linker QLogic recommends that the linker...

Page 78: ...emaining options to the mpicxx script the options to the compiler in question and the names of the files that it operates Also use mpif77 mpif90 or mpif95 for linking otherwise true may have the wrong...

Page 79: ...command line options are used cc gcc the command line variable is used When both the compiler and linker variables are set and they do not match for the compiler you are using the MPI program may fai...

Page 80: ...r Messages on page 4 11 There are multiple ways of specifying how processes are allocated You can use the mpihosts file the np and ppn options with mpirun and the MPI_NPROCS and PSM_SHAREDCONTEXTS_MAX...

Page 81: ...to satisfy the job requirement and try to give a context to each process When context sharing is enabled on a system with multiple QLogic IB adapter boards units and the IPATH_UNIT environment variabl...

Page 82: ...a per node setting or some level of coordination with the job scheduler with setting the environment variable should be used The number of contexts can be explicitly configured with the cfgctxts modul...

Page 83: ...tions benchmarks add usr mpi gcc openmpi 1 4 3 qlc tests osu_benchmarks 3 1 1 to your PATH or if you installed the MPI in another location add MPI_HOME tests osu_benchmarks 3 1 1 to your PATH To enabl...

Page 84: ...ehavior than MVAPICH or the no longer supported QLogic MPI In the second format process_count can be different for each host and is normally the number of available processors on the node When not spe...

Page 85: ...e http www open mpi org faq category running mpirun scheduling Using Open MPI s mpirun The script mpirun is a front end program that starts a parallel MPI job on a set of nodes in an IB cluster mpirun...

Page 86: ...pihosts file Typically the number of node programs should not be larger than the number of processor cores at least not for compute bound programs This option specifies the number of processes to spaw...

Page 87: ...d environments batch scheduled environments typically copy the current environment to the execution of remote jobs so if the current environment has PATH and or LD_LIBRARY_PATH set properly the remote...

Page 88: ...es The prefix option is not sufficient if the installation paths on the remote node are different than the local node for example if lib is used on the local node but lib64 is used on the remote node...

Page 89: ...ngle copy of foo an allocated node mpirun mca btl self np 1 foo Tells Open MPI to use the self BTL and to run a single copy of foo an allocated node The mca switch can be used multiple times to specif...

Page 90: ...OpenMP run time library Use this variable to adjust the split between MPI processes and OpenMP threads Usually the number of MPI processes per node times the number of OpenMP threads will be set to ma...

Page 91: ...it By default IPATH_UNIT is unset and contexts from all configured units are made available to MPI jobs in round robin order Default Unset IPATH_HCA_SELECTION_ALG This variable provides user level sup...

Page 92: ...og If the link is down when the job starts and you want the job to continue blocking until the link comes up use the t 1 option LD_LIBRARY_PATH This variable specifies the path to the run time library...

Page 93: ...cutable is executed as usual using mpirun but typically only one MPI process is run per node and the OpenMP library will create additional threads to utilize all CPUs on that node If there are suffici...

Page 94: ...error codes Using Debuggers See http www open mpi org faq category debugging for details on debugging with Open MPI NOTE With Open MPI and other PSM enabled MPIs you will typically want to turn off PS...

Page 95: ...bugging MPI Programs IB0054606 02 A 4 23 NOTE The TotalView debugger can be used with the Open MPI supplied in this release Consult the TotalView documentation for more information http www open mpi o...

Page 96: ...4 Running MPI on QLogic Adapters Debugging MPI Programs 4 24 IB0054606 02 A...

Page 97: ...5 5 Table 5 1 Other Supported MPI Implementations MPI Implementation Runs Over Compiled With Comments Open MPI 1 4 3 PSM Verbs GCC Intel PGI Provides some MPI 2 functionality one sided operations and...

Page 98: ...s will also have qlc appended after the MPI version number For example usr mpi gcc openmpi VERSION qlc If a prefixed installation location is used usr is replaced by prefix The following examples assu...

Page 99: ...er is also available MVAPICH can be managed with the mpi selector utility as described in Managing MVAPICH and MVAPICH2 with the mpi selector Utility on page 5 5 Compiling MVAPICH Applications As with...

Page 100: ...VAPICH2 that runs over Verbs and is pre compiled with the GNU compiler is also available MVAPICH2 can be managed with the mpi selector utility as described in Managing MVAPICH and MVAPICH2 with the mp...

Page 101: ...pdf Managing MVAPICH and MVAPICH2 with the mpi selector Utility When multiple MPI implementations have been installed on the cluster you can use the mpi selector to switch between them The MPIs that...

Page 102: ...r information on setting the run time library path Platform MPI 8 Platform MPI 8 formerly HP MPI is a high performance production quality implementation of the Message Passing Interface MPI with full...

Page 103: ...e mpirun command running with four processes over PSM mpirun np 4 hostfile mpihosts PSM mpi_app_name To run over IB Verbs type mpirun np 4 hostfile mpihosts IBV mpi_app_name To run over TCP which coul...

Page 104: ...r to psm X X libtmip_psm so Comments OK Intel MPI can also be run over uDAPL which uses IB Verbs uDAPL is the user mode version of the Direct Access Provider Library DAPL and is provided as a part of...

Page 105: ...and ofa v2 ib0 u2 0 nonthreadsafe default libdaplofa so 2 dapl 2 0 ib0 0 3 On every node type the following command as a root user modprobe rdma_ucm To ensure that the module is loaded when the drive...

Page 106: ...ifort the Intel compilers must be installed and resolvable from the user s environment Running Intel MPI Applications Here is an example of a simple mpirun command running with four processes mpirun n...

Page 107: ...rdma OpenIB cma uDAPL 2 0 genv I_MPI_DEVICE rdma ofa v2 ib To help with debugging you can add this option to the Intel mpirun command TMI genv TMI_DEBUG 1 uDAPL genv I_MPI_DEBUG 2 Further Information...

Page 108: ...e MVAPICH defaults to an IB MTU size of 1024 bytes This can be over ridden by setting an environment variable export VIADEV_DEFAULT_MTU MTU2048 Valid values are MTU256 MTU512 MTU1024 MTU2048 and MTU40...

Page 109: ...unrelated to the standard System V Shared Memory API provided by UNIX operating systems Interoperability QLogic SHMEM depends on the Performance Scaled Messaging PSM protocol layer implemented as a u...

Page 110: ...intel mvapich2 1 7 qlc usr mpi pgi mvapich2 1 7 qlc The qlc suffix denotes that this is the QLogic PSM version It is recommended that you match the compiler used to build the MPI implementation with t...

Page 111: ...pi usr shmem qlogic include QLogic recommends that usr shmem qlogic bin is added onto your PATH If it is not on your PATH then you will need to give full pathnamescd to find the shmemrun and shmemcc w...

Page 112: ...to specify the SHMEM include directory the SHMEM library directory and to appropriately link in the SHMEM library The shmemcc script automatically determines the correct directories by finding them r...

Page 113: ...ication binaries will be portable across different implementations of the QLogic SHMEM library including portability over different underlying MPIs Running SHMEM Programs Using shmemrun The shmemrun s...

Page 114: ...nd the options will automatically be remapped as required for the actual mpirun This makes it possible to write scripts that call shmemrun without exposing these details of the underlying mpirun comma...

Page 115: ...N environment variable Alternatively it is possible to write hybrid SHMEM MPI programs that use features from both the SHMEM and MPI libraries These programs must call shmem_init to initialize the SHM...

Page 116: ...elow are various options for integration of the QLogic SHMEM and slurm Full Integration This approach fully integrates QLogic SHMEM start up into slurm and is available when running over MVAPICH2 The...

Page 117: ...te options Note that ssh rsh will be used for starting processes not slurm Sizing Global Shared Memory SHMEM provides shmalloc shrealloc and shfree calls to allocate and release memory using a symmetr...

Page 118: ...y for example in actual use If a SHMEM application program runs out of global shared memory increase the value of SHMEM_SHMALLOC_MAX_SIZE The value of SHMEM_SHMALLOC_INIT_SIZE can also be changed to p...

Page 119: ...rations As long as there is sufficient physical memory for the program the following steps can be used to solve local shared memory allocation problems Check for low ulimits on memory ulimit l max loc...

Page 120: ...The progress thread is provided by PSM and is scheduled at a relatively low frequency typically 10 to 100 times a second This thread will cause independent SHMEM progress where required both on the i...

Page 121: ...ve progress mode will typically be used in the following circumstances For applications that use a polling idiom that is incompatible with the active progress mode and where the application programmer...

Page 122: ...oint for the long get protocol 0 means unlimited SHMEM_PUT_FRAG_LIMIT 4096 Maximum number of outstanding put fragments for this end point for the short put protocol 0 means unlimited Each short put fr...

Page 123: ...ehavior for the QLogic SHMEM implementation SHMEM_PUT_REPLY_COMBINING_COUNT 8 Number of consecutive put replies on a flow to combine together into a single reply Table 6 2 shmemrun Environment Variabl...

Page 124: ...this ordering is guaranteed shmem_quiet This function waits for remote completion of all puts issued by this PE prior to the quiet operation Therefore once the quiet operation returns it is guarantee...

Page 125: ...ry call However performance will typically be substantially improved by using the SHMEM wait operation instead shmem_stack is implemented as a no op since this is a distributed memory cluster architec...

Page 126: ...em_init start_pes my_pe _my_pe shmem_my_pe num_pes _num_pes shmem_n_pes Symmetric heap shmalloc shmemalign shfree shrealloc Contiguous Put Operations shmem_short_p shmem_int_p shmem_long_p shmem_float...

Page 127: ..._int_put_nb shmem_long_put_nb shmem_longdouble_put_nb shmem_longlong_put_nb shmem_put_nb shmem_put32_nb shmem_put64_nb shmem_put128_nb shmem_putmem_nb shmem_short_put_nb Strided Put Operations shmem_d...

Page 128: ..._quiet shmem_wait_nb shmem_test_nb shmem_poll_nb same as shmem_test_nb provided for compatibility Contiguous Get Operations shmem_short_g shmem_int_g shmem_long_g shmem_float_g shmem_double_g shmem_lo...

Page 129: ...hmem_long_get_nb shmem_longdouble_get_nb shmem_longlong_get_nb shmem_short_get_nb shmem_get_nb shmem_get32_nb shmem_get64_nb shmem_get128_nb shmem_getmem_nb Strided Get Operations shmem_double_iget sh...

Page 130: ...adcast64 Concatenation shmem_collect shmem_collect32 shmem_collect64 shmem_fcollect shmem_fcollect32 shmem_fcollect64 Synchronization operations shmem_int_wait shmem_long_wait shmem_longlong_wait shme...

Page 131: ...mem_long_cswap shmem_longlong_cswap shmem_short_mswap shmem_int_mswap shmem_long_mswap shmem_longlong_mswap shmem_short_inc shmem_int_inc shmem_long_inc shmem_longlong_inc shmem_short_add shmem_int_ad...

Page 132: ...shmem_short_or_to_all shmem_int_xor_to_all shmem_long_xor_to_all shmem_longlong_xor_to_all shmem_short_xor_to_all shmem_double_min_to_all shmem_float_min_to_all shmem_int_min_to_all shmem_long_min_to_...

Page 133: ...sum_to_all shmem_longlong_sum_to_all shmem_short_sum_to_all shmem_complexd_prod_to_all complex collectives are not implemented shmem_complexf_prod_to_all complex collectives are not implemented shmem_...

Page 134: ...ts PE for accessibility shmem_addr_accessible test address on PE for accessibility Cache Operations for compatibility shmem_clear_cache_inv implemented as a no op shmem_clear_cache_line_inv implemente...

Page 135: ...e processes equally divided between them The processes are split up into pairs with one from each pair on either host and each pair is loaded with the desired traffic pattern The benchmark automatical...

Page 136: ...specified in bytes default 8 Options See Table 6 5 b INT batch size number of concurrent operations default 64 f force order for bifurcation of PEs based on rank order h displays the help page l INT...

Page 137: ...ndow this is the default q for blocking puts use quiet every window r use ring pattern default is random s enable communication to self t FLOAT if the loop count is not given run the test for this man...

Page 138: ...e non pipelined mode for NB ops default pipelined o OP choose OP from put or putnb p INTEGER offset for all to all schedule default 1 usually set to ppn r randomize all to all schedule s enable commun...

Page 139: ...rograms IB0054606 02 A 6 31 Table 6 8 QLogic SHMEM reduce benchmark options Option Description b INTEGER number of barriers between reduces default 0 h displays the help page i INTEGER K outer iterati...

Page 140: ...6 SHMEM Description and Configuration SHMEM Benchmark Programs 6 32 IB0054606 02 A...

Page 141: ...ate deactivate these features Other MPIs will require use of environment variables to leverage these capabilities With MPI applications the environment variables need to be propagated across all nodes...

Page 142: ...will automatically obtain the SL and Pkey to use for the vFabric from the QLogic Fabric Manager via path record queries Using SL and PKeys SL and Pkeys can be specified natively for Open MPI For othe...

Page 143: ...SA The SIDs configured in the QLogic Fabric Manager configuration file should also be provided to the Distributed SA for correct operation Service ID can be specified natively for Open MPI For other M...

Page 144: ...mapping for any given port however QLogic 7300 series adapters exports the SL2VL mapping via sysfs files These files are used by PSM to implement the SL2VL tables automatically The SL2VL tables are pe...

Page 145: ...le multiple DLID entries in the port forwarding table that could map to different egress ports Dispersive routing as implemented in the PSM attempts to avoid congestion hotspots described above by spr...

Page 146: ...described above and a 16 process MPI application that spans these nodes 8 process per node Then Each MPI process is automatically bound to a given CPU core numbered between 0 7 PSM does this at startu...

Page 147: ...a single process on Node B only one path will be used across all processes Static_Base The only path that is used is the base path SLID DLID between nodes regardless of the LMC of the fabric or the n...

Page 148: ...8 Dispersive Routing 8 4 IB0054606 02 A...

Page 149: ...LE7342 adapter for the node The following software is included with the QLogic OFED installation software package gPXE boot image patch for DHCP server tool to install gPXE boot image in EPROM of card...

Page 150: ...ter GUID The dhcpd on the existing DHCP server may need to be patched This patch will be provided via the gPXE rpm installation 3 Write the ROM image to the IB adapter This only needs to be done once...

Page 151: ...client identifier value such that the DHCP server will grant the same IP address to any client that conveys this client identifier 2 Unpack the latest downloaded DHCP server tar zxf dhcp release tar...

Page 152: ...DHCP server The following is the sample etc dhcpd conf file that specifies the HCA GUID for the hardware address DHCP Server Configuration file see usr share doc dhcp dhcpd conf sample ddns update st...

Page 153: ...diskless booting with an http boot server Boot Server Setup Configure the boot server for your site NOTE The dhcpd and apache configuration files referenced in this example are included as examples an...

Page 154: ...of the images conf file Alias images vault images Directory vault images AllowOverride All Options Indexes FollowSymLinks Order allow deny Allow from all Directory The following is an example of the...

Page 155: ...tp driverdownloads qlogic com QLogicDriverDownloads_UI default aspx a If vault images initrd img file is already present on the server machine back it up For example cp a vault images initrd img vault...

Page 156: ...machine of the same type hardware configuration and BIOS settings start with a known path to get the system commands PATH sbin usr sbin bin usr bin PATH start from a copy of the current initd image mk...

Page 157: ...e qib driver will require the dca module if modinfo F depends ib_qib grep q dca then cp find lib modules uname r name dca ko lib ib dcacmd sbin insmod lib ib dca ko else dcacmd fi IB requires loading...

Page 158: ...em in order to use it for NFS etc n Now build the commands to load the additional modules We add them just after the last existing insmod command so all other dependences will be resolved You can chan...

Page 159: ...ho finished loading IB modules End of IB module block EOF first get line number where we append after last insmod if any otherwse at start line egrep n insmod init sed n s p if line then line 1 fi sed...

Page 160: ...ges initrd kern img c Run the usr share infinipath gPXE gpxe qib modify initrd script to create the initrd img file At this stage the initrd img file is ready and located at the location where the DHC...

Page 161: ...RVER SERVER_NAME port baseurl baseURL selfurl baseurl _SERVER REQUEST_URI dirurl baseurl dirname _SERVER SCRIPT_NAME kver 2 6 18 164 11 1 el5 echo EOF gpxe initrd images initrd img kernel kernels vmli...

Page 162: ...HCA is finished and the boot image is ready 3 Verify system boots off of the kernel image on the boot server The best way to do this is to boot into a different kernel from the one installed on the ha...

Page 163: ...ing the examples in Step 2 of Boot Server Setup and place them in the etc httpd conf d directory 6 Edit etc dhcpd conf file to boot the clients using HTTP filename http 172 26 32 9 images uniboot unib...

Page 164: ...9 gPXE HTTP Boot Setup 9 16 IB0054606 02 A...

Page 165: ...lc and that mpi selector is used to choose this Open MPI version as the MPI to be used The following examples are intended to show only the syntax for invoking these programs and the meaning of the ou...

Page 166: ...for all operations Half the time interval observed by the rank zero process for each exchange is a measure of the latency for messages of that size as previously defined The program uses a loop execut...

Page 167: ...Latency Test v3 1 1 Size Latency us 0 1 67 1 1 68 2 1 69 4 1 68 8 1 68 16 1 93 32 1 92 64 1 92 128 1 99 256 2 12 512 2 38 1024 2 74 2048 3 52 4096 4 59 8192 6 52 16384 9 98 32768 17 65 65536 52 11 131...

Page 168: ...the osu_latency code except in this case the originator of the messages pumps a number of them 64 in the installed version in succession using the non blocking MPI_I send function while the receiving...

Page 169: ...34 55 32 68 89 64 137 87 128 265 80 256 480 19 512 843 70 1024 1353 48 2048 1984 11 4096 2152 61 8192 2249 00 16384 2680 75 32768 2905 83 65536 3170 05 131072 3224 15 262144 3241 35 524288 3270 21 10...

Page 170: ...esses Each of the sending processes sends a fixed number of messages the window size back to back to the paired receiving process before waiting for a reply from the receiver This process is repeated...

Page 171: ...79 96 256 2110 22 8243066 36 512 2353 17 4596038 46 1024 2495 88 2437386 38 2048 2573 99 1256833 08 4096 2567 88 626923 21 8192 2757 54 336613 42 16384 3283 94 200435 90 32768 3291 54 100449 84 65536...

Page 172: ...ging Rate Microbenchmarks A 8 IB0054606 02 A N 2 is dynamically calculated at the end of the run You can use the b option to get a bidirectional message rate and bandwidth results Scalability has been...

Page 173: ...888 4 100 075479 25018869 818990 8 200 115037 25014379 610716 16 284 475601 17779725 040265 32 568 950239 17779694 953511 64 1137 899392 17779677 998115 128 1758 183987 13735812 394705 256 2116 159352...

Page 174: ...Bandwidth MB s Messages s 1 34 572819 34572819 324348 2 68 984920 34492459 942272 4 137 870850 34467712 532016 8 274 914966 34364370 730843 16 438 182185 27386386 585309 32 871 077525 27221172 671073...

Page 175: ...nchmark 3 Messaging Rate Microbenchmarks IB0054606 02 A A 11 The higher peak bi directional messaging rate of 34 6 million messages per second at the 1 byte size compared to 25 million messages sec wh...

Page 176: ...A Benchmark Programs Benchmark 3 Messaging Rate Microbenchmarks A 12 IB0054606 02 A...

Page 177: ...adapter port through which the host communicates with a SRP target device e g a Fibre Channel disk array via a SRP target port A SRP Target Port is an IOC of the VIO hardware In the context of VIO ha...

Page 178: ...C only IB attached storage will use their own mechanism as maps are not necessary A SRP Adapter is a collection of SRP sessions This collection is then presented to the Linux kernel as if those sessio...

Page 179: ...nsion 2 end The session command has two parts the part that specifies the SRP initiator and the part that specifies the SRP target port The SRP initiator contains two parts the SRP initiator port and...

Page 180: ...e in this manner other devices have their own naming method To specify the host IB port to use the user can either specify the port GUID of the local IB port or simply use the index numbers of the car...

Page 181: ...0 name SRP T10 0000000000000001 id 0x0000494353535250 service 1 name SRP T10 0000000000000002 id 0x0000494353535250 service 2 name SRP T10 0000000000000003 id 0x0000494353535250 service 3 name SRP T1...

Page 182: ...ZE 320 SRP IU SG SIZE 15 SRP IO CLASS 0xff00 service 0 name SRP T10 0000000000000001 id 0x0000494353535250 service 1 name SRP T10 0000000000000002 id 0x0000494353535250 service 2 name SRP T10 00000000...

Page 183: ...SRP T10 0000000000000001 id 0x0000494353535250 session begin card 0 port 1 portGuid 0x0002c9030000110d initiatorExtension 1 targetIOCGuid 0x00066a01e0000149 targetIOCProfileIdString FVIC in Chassis 0x...

Page 184: ...r side by card index card 0 Specifies first HCA port 1 Specifies first port targetIOCGuid 0x00066013800016C end Specifying an SRP Initiator Port of Session by Port GUID The following example specifies...

Page 185: ...method if the port GUIDs are changed they must also be changed in the configuration file NOTE When specifying the targetIOCProfileIdString the string is case and format sensitive The easiest way to g...

Page 186: ...targetIOCProfileIdString FVIC in Chassis 0x00066A005000010E Slot 1 IOC 1 end Specifying an Adapter An adapter is a collection of sessions This collection is presented to the Linux kernel as if the co...

Page 187: ...er is configured with only one session and that session fails all SCSI I Os on that session will fail and access to SCSI target devices will be lost While the qlgc_srp module will attempt to recover t...

Page 188: ...e configuration that uses multiple sessions and adapters session begin card 0 port 2 targetIOCProfileIdString FVIC in Chassis 0x00066A005000011D Slot 1 IOC 1 initiatorExtension 3 end adapter begin des...

Page 189: ...pter Following is a list of the different type of failover scenarios Failing over from one SRP initiator port to another Failing over from a port on the VIO hardware card to another port on the VIO ha...

Page 190: ...x0000494353535250 session begin card 0 port 1 portGuid 0x0002c903000010f1 initiatorExtension 1 targetIOCGuid 0x00066a01e0000149 targetIOCProfileIdString BC2FC in Chassis 0x0000000000000000 Slot 6 Ioc...

Page 191: ...om a port on the VIO hardware card to another port on the VIO hardware card session begin card 0 InfiniServ HCA card number port 1 InfiniServ HCA port number targetIOCProfileIdString FVIC in Chassis F...

Page 192: ...ion File 3 Failing over from a port on a VIO hardware card to a port on a different VIO hardware card within the same Virtual I O chassis session begin card 0 InfiniServ HCA card number port 1 InfiniS...

Page 193: ...erent Virtual I O chassis session begin card 0 InfiniServ HCA card number port 1 InfiniServ HCA port number targetIOCProfileIdString FVIC in Chassis FRUChassisGUID1 Slot1 IOC initiatorExtension 1 end...

Page 194: ...first example traffic going to any Fibre Channel Target Device where both ports of the VIO hardware card have a valid map are split between the two ports of the VIO hardware card If one of the VIO har...

Page 195: ...f one of the sessions goes down due to an IB cable failure or an FC cable failure all traffic will begin using the other session session begin card 0 port 2 targetIOCProfileIdString FVIC in Chassis 0x...

Page 196: ...sions If there is a failure in one of the sessions e g one of the VIO hardware cards is rebooted traffic will begin using the other session session begin card 0 port 2 targetIOCProfileIdString FVIC in...

Page 197: ...P IOC Profile Native IB Storage SRP Driver SRP IOC GUID 0x00066a01dd000021 SRP IU SIZE 320 SRP IU SG SIZE 15 SRP IO CLASS 0xff00 service 0 name SRP T10 0000000000000001 id 0x0000494353535250 service 1...

Page 198: ...00000066a01e0000149 targetExtension 0x0000000000000001 SID 0x0000494353535250 IOClass 0x0100 end session begin card 0 port 2 portGuid 0x0002c903000010f2 initiatorExtension 1 targetIOCGuid 0x00066a01e0...

Page 199: ...a11dd000021 HCA 0 Port 2 0x0002c9020026041e Target Port GID 0xfe8000000000000000066a11dd000021 qlgc_srp cfg session begin targetIOCGuid 0x0002C90200400098 targetExtension 0x0002C90200400098 end adapte...

Page 200: ...e it automatically loaded 2 Discover the SRP devices on your fabric by running this command as a root user ibsrpdm In the output look for lines similar to these GUID 0002c90200402c04 ID LSI Storage Sy...

Page 201: ...the target you want and echo it into the add_target file echo id_ext 21000001ff040bf6 ioc_guid 21000001ff040bf6 dgid f e8000000000000021000001ff040bf6 pkey ffff service_id f60b 04ff01000021 initiator...

Page 202: ...B SRP Configuration OFED SRP Configuration B 26 IB0054606 02 A Notes...

Page 203: ...rocess and file clean up after batch MPI PSM jobs have completed Clean Termination of MPI Processes and Clean up PSM Shared Memory Files Clean Termination of MPI Processes The InfiniPath software norm...

Page 204: ...irectory The file is owned by the user and in permission rwx it can be removed either by the user or by root PSM relies on the MPI implementation to cleanup after abnormal job termination In cases whe...

Page 205: ...sh files bin ls dev shm psm_shm 2 dev null for file in files do sbin fuser file dev null 2 1 if ne 0 then bin rm file dev null 2 1 fi done When the system is idle the administrators can remove all of...

Page 206: ...C Integration with a Batch Queuing System Clean up PSM Shared Memory Files C 4 IB0054606 02 A...

Page 207: ...Software Installation Guide Using LEDs to Check the State of the Adapter The LEDs function as link and data indicators once the InfiniPath software has been installed the driver has been loaded and t...

Page 208: ...ted and the physical link is up Ready to talk to SM to bring the link fully up If this state persists the SM may be missing or the link may not be configured Use ipath_control i to verify the software...

Page 209: ...efer to the QLogic Fabric Software Installation Guide for more information InfiniPath Interrupts Not Working The InfiniPath driver cannot configure the InfiniPath link to a usable state unless interru...

Page 210: ...depending on your distribution If no output is displayed check that ACPI is enabled in your BIOS settings To track down other initialization failures see InfiniPath ib_qib Initialization Failure on pa...

Page 211: ...g 1 node processes If this error appears check to see if the InfiniPath driver is loaded by typing lsmod grep ib_qib If no output is displayed the driver did not load for some reason In this case try...

Page 212: ...nected the switch is down SM is not running or that a hardware error occurred OpenFabrics and InfiniPath Issues The following sections cover issues related to OpenFabrics including Subnet Managers and...

Page 213: ...tes Connection Refused errors if it is loaded before IPoIB has been loaded and configured To solve the problem load and configure IPoIB first Set IBPATH for OpenFabrics Scripts The environment variabl...

Page 214: ...ffff service_id f60b04ff01000 021 sys class infiniband_srp srp ipath0 1 add_target Outdated ipath_ether Configuration Setup Generates Error Ethernet emulation ipath_ether has been removed in this rele...

Page 215: ...ebugging See your switch vendor for more information QLogic recommends using FastFabric to help diagnose this problem If FastFabric is not installed in the fabric there are two diagnostic tools ibhost...

Page 216: ...onfig irqbalance off etc init d irqbalance stop Next find the IRQ number and bind it to a CPU The IRQ number can be found in one of two ways depending on the system used Both methods are described in...

Page 217: ...ook at the stats in proc interrupts while the adapter is active to observe which CPU is fielding ib_qib interrupts Immediately change the processor affinity of an IRQ To immediately change the process...

Page 218: ...ms are described in the following sections Invalid Configuration Warning Open MPI warns about a invalid configuration every time it is run with the following warning WARNING There are more than one ac...

Page 219: ...are To determine if the logical connection between the IB host and the VIO hardware is correct check the following The correct VirtualNIC driver is running The etc infiniband qlgc_vnic cfg file contai...

Page 220: ...ng that the qlgc_vnic cfg file contains the correct information Use the following scenarios to verify that the qlgc_vnic cfg file contains a definition for the applicable virtual interface Issue the c...

Page 221: ...09 max controllers 0x03 controller 1 GUID 00066a0130000001 vendor ID 00066a device ID 000030 IO class 2000 ID Chassis 0x00066A00010003F2 Slot 1 IOC 1 service entries 2 service 0 1000066a00000001 Infin...

Page 222: ...66a0130000001 dgid fe8000000000000000066a02580000 01 pkey ffff ioc_guid 00066a0230000001 dgid fe8000000000000000066a02580000 01 pkey ffff ioc_guid 00066a0330000001 dgid fe8000000000000000066a02580000...

Page 223: ...ed to an IB switch For example st139 ibv_devinfo hca_id mlx4_0 fw_ver 2 2 000 node_guid 0002 c903 0000 0f80 sys_image_guid 0002 c903 0000 0f83 vendor_id 0x02c9 vendor_part_id 25418 hw_ver 0xA0 board_i...

Page 224: ...e for each EIOC configuration file in the following list ls etc sysconfig network scripts ifcfg eioc1 ifcfg eioc2 ifcfg eioc3 ifcfg eioc4 ifcfg eioc5 ifcfg eioc6 Interface does not show up in output o...

Page 225: ...ADCAST 172 26 63 255 NETMASK 255 255 240 0 NETWORK 172 26 48 0 STARTMODE hotplug TYPE Ethernet Verify the physical connection between the VIO hardware and the Ethernet network If the interface is disp...

Page 226: ...thernet port If a VIO hardware module can be seen from a host the ib_qlgc_vnic_query s file displays information similar to EVIC in Chassis 0x00066a000300012a Slot 19 Ioc 1 EVIC in Chassis 0x00066a000...

Page 227: ...UN Depth 16 Max LUN Scan 512 Max IO 131072 128 KB Max Sectors 256 Max SG Depth 33 Session Count 2 No Connect T O 60 Second s Register In Order ON Dev Reqst T O 2 Second s Description SRP Virtual HBA 1...

Page 228: ...D 0xfe8000000000000000066a0260000165 SRP IOC Profile Chassis 0x00066A0001000481 Slot 1 IOC 2 SRP Target IOClass 0xFF00 SRP Target SID 0x0000494353535250 SRP IPI Guid 0x00066a000100d052 SRP IPI Extnsn...

Page 229: ...on Rejected state according to var log messages If the session is part of a multi session adapter ib_qlgc_srp_stats shows it in the Connection Rejected state A host displays Connection Failed for Sess...

Page 230: ...HBA 1 Session Session 1 State Disconnected Source GID 0xfe8000000000000000066a000100d051 Destination GID 0xfe8000000000000000066a0260000165 SRP IOC Profile Chassis 0x00066A0001000481 Slot 1 IOC 1 SRP...

Page 231: ...d 0x00066a0238000165 SRP TPI Extnsn 0x0000000000000001 Source LID 0x000c Dest LID 0x0004 Completed Sends 0x00000000000001c8 Send Errors 0x0000000000000000 Completed Receives 0x00000000000001c8 Receive...

Page 232: ...initiatorExtension in the failing Session block of the qlgc_srp cfg file or the adapter port GUID specified in the failing Session block of the qlgc_srp cfg file Additionally make certain that the map...

Page 233: ...the cable between the storage device and the Fibre Channel switch was pulled the VIO hardware log will display a Connection Lost to NPort Id message The next time the host tries to do an input output...

Page 234: ...VFx Port GUID is of the form 00066app38iiiiii where pp gives the IOC number 1 or 2 and iiiiiii gives the individual ID number of the VIO hardware so 00066a0138iiiiiii is the port guid of IOC 1 of VIO...

Page 235: ...rest will be Connected The transition of a session from Connected to Active will not be attempted until that session needs to become Active due to the failure of the previously Active session How doe...

Page 236: ...r_part_id 25218 hw_ver 0xA0 board_id SS_0000000005 phys_port_cnt 2 port 1 state PORT_ACTIVE 4 max_mtu 2048 4 active_mtu 2048 4 sm_lid 71 port_lid 60 port_lmc 0x00 st106 ibv_devinfo i 2 hca_id mthca0 f...

Page 237: ...19 Need to determine the SRP driver version Solution To determine the SRP driver version number enter the command modinfo d qlgc srp which returns information similar to the following st159 modinfo d...

Page 238: ...E ULP Troubleshooting Troubleshooting SRP Issues E 20 IB0054606 02 A...

Page 239: ...ions for enabling and disabling WC using PAT and MTRR and for verifying that write combining is working PAT and Write Combining The wc_pat parameter is set in etc modprobe conf on Red Hat systems or e...

Page 240: ...ws the memory to be mapped with fewer MTRRs so that there will be one or more unused MTRRs for the InfiniPath driver Some BIOS do not have the MTRR mapping option It may have a different name dependin...

Page 241: ...he following command ipath_pkt_test B With write combining enabled the QLE7140 and QLE7240 report in the range of 1150 1500 MBps The QLE7280 reports in the range of 1950 3000 MBps You can also use ipa...

Page 242: ...F Write Combining Verify Write Combining is Working F 4 IB0054606 02 A Notes...

Page 243: ...uster environment Use the following items as a checklist for verifying homogeneity A difference in any one of these items in your cluster may cause problems Kernels Distributions Versions of the QLogi...

Page 244: ...d enables disables ser vices including drivers Can be useful for checking homoge neity dmesg Prints out bootup messages Useful for checking for initializa tion problems iba_opp_query Retrieves path re...

Page 245: ...adapt ers or using an IB loopback connector tests within a single QLogic IB adapter ipathstatsc Displays driver statistics and hardware counters including performance and error including status count...

Page 246: ...queries that can be done are much more limited than with iba_saquery In particular it can only find paths that start on the machine where the command is run In other words the source LID or source GI...

Page 247: ...pkey pkey Partition Key i sid sid Service ID h hca hca The HCA to use Defaults to the first HCA The HCA can be identified by name mthca0 qib1 et cetera or by number 1 2 3 et cetera p port port The por...

Page 248: ...0x75 slid 0x31 hop 0x0 flow 0x0 tclass 0x0 num_path 0x0 pkey 0x0 qos_class 0x0 sl 0x0 mtu 0x0 rate 0x0 pkt_life 0x0 preference 0x0 resv2 0x0 resv3 0x0 Using HCA qib0 Result resv1 0x0000000000000107 d...

Page 249: ...will never have to specify which HCA to use This is only relevant in the case where a single node is connected to multiple physical IB fabrics Finally the bottom half of the output shows the result o...

Page 250: ...id 263 using decimal numbers Note that these queries are the same as the first two only the base of the numbers has changed Query by LID and PKEY iba_opp_query slid 0x31 dlid 0x75 pkey 0x8002 Query by...

Page 251: ...CIe Gen2 x8 V1 N A YA N A FW 2 9 1000 Image type ConnectX FW Version 2 9 1000 Device ID 26428 Description Node Port1 Port2 Sys image GUIDs 0002c903000ba8e0 0002c903000ba8e1 0002c903000ba8e2 0002c90300...

Page 252: ...en true mdio_en_port1 0 IB phy_type_port1 XFI phy_type_port2 XFI read_cable_params_port1_en true read_cable_params_port2_en true Polarity eth_tx_lane_polarity_port1 0x0 eth_tx_lane_polarity_port2 0x0...

Page 253: ..._main_qdr 0x0 port2_sd0_ob_preemp_main_qdr 0x0 port1_sd1_ob_preemp_main_qdr 0x0 port2_sd1_ob_preemp_main_qdr 0x0 port1_sd2_ob_preemp_main_qdr 0x0 port2_sd2_ob_preemp_main_qdr 0x0 port1_sd3_ob_preemp_m...

Page 254: ...mp_pre 0x8 auto_ddr_option_1 tx_preemp_msb 0x0 auto_ddr_option_1 tx_preemp_post 0x2 auto_ddr_option_1 tx_preemp_main 0x10 auto_ddr_option_1 tx_preemp 0x0 auto_ddr_option_2 tx_preemp_pre 0xa auto_ddr_o...

Page 255: ...option_7 tx_preemp_msb 0x1 auto_ddr_option_7 tx_preemp_post 0x3 auto_ddr_option_7 tx_preemp_main 0x17 auto_ddr_option_7 tx_preemp 0x0 auto_ddr_option_8 tx_preemp_pre 0xf auto_ddr_option_8 tx_preemp_ms...

Page 256: ...auto_ddr_option_13 tx_preemp_main 0x5 auto_ddr_option_13 tx_preemp 0x0 auto_ddr_option_14 tx_preemp_pre 0x0 auto_ddr_option_14 tx_preemp_msb 0x0 auto_ddr_option_14 tx_preemp_post 0x0 auto_ddr_option_1...

Page 257: ...auto_ddr_option_5 rx_equal_offs 0x0 auto_ddr_option_6 rx_equal_offs 0x0 auto_ddr_option_7 rx_equal_offs 0x0 auto_ddr_option_0 rx_muxeq 0x0 auto_ddr_option_1 rx_muxeq 0x0 auto_ddr_option_2 rx_muxeq 0x...

Page 258: ...sigdet_th 0x1 auto_ddr_option_4 rx_sigdet_th 0x1 auto_ddr_option_5 rx_sigdet_th 0x1 auto_ddr_option_6 rx_sigdet_th 0x1 auto_ddr_option_7 rx_sigdet_th 0x1 auto_ddr_option_0 rx_equalization 0x4 auto_ddr...

Page 259: ...on_13 rx_muxeq 0x0 auto_ddr_option_13 rx_muxmain 0x1f auto_ddr_option_13 rx_main 0xf auto_ddr_option_13 rx_extra_hs_gain 0x3 auto_ddr_option_13 rx_equalization 0x0 auto_ddr_option_14 rx_muxeq 0x0 auto...

Page 260: ...OK 0x00001234 0x0000280f 0x0015dc BOOT2 OK 0x00002810 0x000034ef 0x000ce0 Configuration OK 0x000034f0 0x00003533 0x000044 GUID OK 0x00003534 0x0000366b 0x000138 Image Info OK 0x0000366c 0x0000946f 0x...

Page 261: ...e operation Options H help this message v verbose additional output t target guid guid of target switch in hex format for example 0x00066a00e3001234 h hca HCA number default is first HCA p port port n...

Page 262: ...update using fileName parameter must be an emfw file fwVerify perform firmware validation validate firmware in primary secondary EEPROMs report which was booted ping test for switch presence reboot r...

Page 263: ...ides IB packet analysis The snoop_enable variable must be set to 1 enabled in the modprobe conf ib_qib conf file to create snoop devices and capture devices If snoop_enable is set to 0 disable then no...

Page 264: ...be downloaded from http www wireshark org Intel recommends using version 1 6 2 ibhosts This tool determines if all the hosts in your IB fabric are up and visible to the subnet manager and to each oth...

Page 265: ...The IB LIDs of the two nodes in this example are determined by using the ipath_control i command on each node The ibtracert tool produces output similar to the following when run as a root user from...

Page 266: ...0078 a5d2 sys_image_guid 0011 7500 0078 a5d2 vendor_id 0x1175 vendor_part_id 29474 hw_ver 0x2 board_id InfiniPath_QLE7340 phys_port_cnt 1 port 1 state PORT_ACTIVE 4 max_mtu 4096 5 active_mtu 4096 5 s...

Page 267: ...ident ib_qib ko ib_qib ko ident warning no id keywords in ib_qib ko ipath_checkout The ipath_checkout tool is a bash script that verifies that the installation is correct and that all the nodes of the...

Page 268: ...y test on every pair of nodes and analyzes the results The options available with ipath_checkout are shown in Table G 2 NOTE The hostnames in the nodefile are Ethernet hostnames not IPv4 addresses To...

Page 269: ...1 linkcontrol status_str sys class infiniband qib0 device driver version These files are also documented in Table G 4 and Table G 5 Other than the i option this script must be run with root permission...

Page 270: ...OFED Release x x x Date yyyy mm dd hh mm 0 Version ChipABI 2 0 InfiniPath_QLE7342 InfiniPath1 6 1 SW Compat 2 0 Serial RIB0941C00005 LocalBus PCIe 5000MHz x8 0 1 Status 0xe1 Initted Present IB_link_u...

Page 271: ...ttings using the BIOS Setup utility For specific instructions follow the hardware documentation that came with your system QLogic also provides a script ipath_mtrr which sets the MTRR registers enabli...

Page 272: ...lated It is installed from the infinipath RPM It displays both driver statistics and hardware counters including both performance and error including status counters Running ipathstats c 10 for exampl...

Page 273: ...raffic patterns optionally including oneself and one s local shared memory shm peers It can also be set up with multi dimensional grid traffic patterns this can be parameterized to run rings open 2D g...

Page 274: ...ries The option qa queries all To query a package that has not yet been installed use the qpl option strings Use the strings command to determine the content of and extract text from a binary file The...

Page 275: ...rob lems and hardware errors Verify hosts via an Ethernet ping ipath_checkout run 1 hostsfile Verify ssh ipath_checkout run 2 hostsfile Show uname a for all hosts mpirun m hostsfile ppn 1 np numhosts...

Page 276: ...allhosts mpirun m hostsfile ppn 1 np numhosts nonmpi ipath_control i Verify that the hosts see each other ipath_checkout run 5 hostsfile Check MPI performance ipath_checkout run 7 hostsfile Generate a...

Page 277: ...Description Initted The driver has loaded and successfully initialized the IBA6110 or IBA7220 ASIC Present The IBA6110 or IBA7220 ASIC has been detected but not initialized unless Initted is also pre...

Page 278: ...ration template files used by the InfiniPath and OpenFabrics software Table G 6 Status Other Files File Name Contents lid IB LID The address on the IB fabric similar conceptually to an IP address for...

Page 279: ...d by the modprobe command Also used for creating aliases The PAT write combing option is set here For SLES systems etc infiniband openib conf The primary configuration file for InfiniPath OFED modules...

Page 280: ...G Commands and Files Summary of Configuration Files G 38 IB0054606 02 A...

Page 281: ...romio Books for Learning MPI Programming Gropp William Ewing Lusk and Anthony Skjellum Using MPI Second Edition 1999 MIT Press ISBN 0 262 57134 X Gropp William Ewing Lusk and Anthony Skjellum Using MP...

Page 282: ...orking The Internet Frequently Asked Questions FAQ archives contain an extensive Request for Command RFC section Numerous documents on networking and configuration can be found at http www faqs org rf...

Page 283: ......

Page 284: ...Logic logo and the Powered by QLogic logo are registered trademarks of QLogic Corporation InfiniBand is a registered trademark of the InfiniBand Trade Association All other brand and product names are...

Reviews: