background image

IB0054606-02  A

OFED+ Host Software

Release 1.5.4

User Guide

Содержание OFED+ Host

Страница 1: ...IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...

Страница 2: ...orporation reserves the right to change product specifications at any time without notice Applications described in this document for any of these products are for illustrative purposes only QLogic Co...

Страница 3: ...PI Usage Checklists Cluster Setup 2 1 Using MPI 2 2 3 InfiniBand Cluster Setup and Administration Introduction 3 1 Installed Layout 3 2 IB and OpenFabrics Driver Overview 3 3 IPoIB Network Interface C...

Страница 4: ...igure the ib_qib Driver State 3 22 Start Stop or Restart ib_qib Driver 3 22 Unload the Driver Modules Manually 3 23 ib_qib Driver Filesystem 3 23 More Information on Configuring and Loading Drivers 3...

Страница 5: ...ations 4 3 Further Information on Open MPI 4 4 Configuring MPI Programs for Open MPI 4 5 To Use Another Compiler 4 5 Compiler and Linker Variables 4 7 Process Allocation 4 7 IB Hardware Contexts on th...

Страница 6: ...on MVAPICH2 5 5 Managing MVAPICH and MVAPICH2 with the mpi selector Utility 5 5 Platform MPI 8 5 6 Installation 5 6 Setup 5 6 Compiling Platform MPI 8 Applications 5 7 Running Platform MPI 8 Applicati...

Страница 7: ...6 13 Environment Variables 6 13 Implementation Behavior 6 15 Application Programming Interface 6 17 SHMEM Benchmark Programs 6 27 7 Virtual Fabric support in PSM Introduction 7 1 Virtual Fabric Suppor...

Страница 8: ...SRP Target Port of a Session by IOCGUID B 10 Specifying a SRP Target Port of a Session by Profile String B 10 Specifying an Adapter B 10 Restarting the SRP Module B 11 Configuring an Adapter with Mult...

Страница 9: ...Failure D 5 MPI Job Failures Due to Initialization Problems D 6 OpenFabrics and InfiniPath Issues D 6 Stop Infinipath Services Before Stopping Restarting InfiniPath D 6 Manual Shutdown or Restart May...

Страница 10: ...hardware and the Ethernet network E 7 Troubleshooting SRP Issues E 9 ib_qlgc_srp_stats showing session in disconnected state E 9 Session in Connection Rejected state E 11 Attempts to read or write to...

Страница 11: ...mod G 30 modprobe G 30 mpirun G 31 mpi_stress G 31 rpm G 32 strings G 32 Common Tasks and Commands G 32 Summary and Descriptions of Useful Files G 34 boardversion G 34 status_str G 35 version G 36 Sum...

Страница 12: ...3 Distributed SA Multiple Virtual Fabrics Example 3 14 3 4 Distributed SA Multiple Virtual Fabrics Configured Example 3 15 3 5 Virtual Fabrics with Overlapping Definitions 3 15 3 6 Virtual Fabrics wi...

Страница 13: ...APICH Wrapper Scripts 5 3 5 3 MVAPICH Wrapper Scripts 5 4 5 4 Platform MPI 8 Wrapper Scripts 5 7 5 5 Intel MPI Wrapper Scripts 5 10 6 1 SHMEM Run Time Library Environment Variables 6 13 6 2 shmemrun E...

Страница 14: ...xiv IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...

Страница 15: ...nce This guide is intended for end users responsible for administration of a cluster network as well as for end users who want to use that cluster This guide assumes that all users are familiar with c...

Страница 16: ...or command line text For example To return to the root directory from anywhere in the file structure Type cd root and press ENTER Enter the following command sh install bin Key names and key strokes...

Страница 17: ...ication on the left The QLogic Global Training portal offers online courses certification exams and scheduling of in person training Technical Certification courses include installation maintenance an...

Страница 18: ...s an extensive collection of QLogic product information that you can search for specific solutions We are constantly adding to the collection of information in our database to provide answers to your...

Страница 19: ...ples for compiling and running MPI programs with other MPI implementations Section 7 describes QLogic Performance Scaled Messaging PSM that provides support for full Virtual Fabric vFabric integration...

Страница 20: ...dware installation and the QLogic InfiniBand Fabric Software Installation Guide contains information on QLogic software installation Overview The material in this documentation pertains to a QLogic OF...

Страница 21: ...teroperability QLogic OFED participates in the standard IB subnet management protocols for configuration and monitoring Note that QLogic OFED including Internet Protocol over InfiniBand IPoIB is inter...

Страница 22: ...1 Introduction Interoperability 1 4 IB0054606 02 A...

Страница 23: ...gement problems the compute nodes of the cluster must have very similar hardware configurations and identical software installations See Homogeneous Nodes on page 3 37 for more information 2 Check tha...

Страница 24: ...en MPI Applications on page 4 2 4 Create an mpihosts file that lists the nodes where your programs will run See Create the mpihosts File on page 4 3 5 Run Open MPI applications See Running Open MPI Ap...

Страница 25: ...are This software provides the foundation that supports the MPI implementation Figure 3 1 illustrates these relationships Note that HP MPI Platform MPI Intel MPI MVAPICH MVAPICH2 and Open MPI can run...

Страница 26: ...bin opt iba Documentation is found in usr share man usr share doc infinipath License information is found only in usr share doc infinipath QLogic OFED Host Software user documentation can be found on...

Страница 27: ...SRP devices on the fabric have been discovered MPI over uDAPL can be used by Intel MPI IPoIB must be configured before MPI over uDAPL can be set up Other optional drivers can now be configured and en...

Страница 28: ...RX packets 0 errors 0 dropped 0 overruns 0 frame 0 TX packets 0 errors 0 dropped 0 overruns 0 carrier 0 collisions 0 txqueuelen 128 RX bytes 0 0 0 b TX bytes 0 0 0 b 3 Type ping c 2 b 10 1 17 255 The...

Страница 29: ...QLogic recommends using the QLogic IFS Installer TUI FastFabric or iba_config command to configure the boot time and autostart of the IPoIB driver Refer to the QLogic InfiniBand Fabric Software Instal...

Страница 30: ...Logic supports bonding across HCA ports and bonding port 1 and port 2 on the same HCA Interface Configuration Scripts Create interface configuration scripts for the ibX and bondX interfaces Once the c...

Страница 31: ...0 downdelay 0 The following is an example for ib0 slave The file is named etc sysconfig network scripts ifcfg ib0 DEVICE ib0 USERCTL no ONBOOT yes MASTER bond0 SLAVE yes BOOTPROTO none TYPE InfiniBan...

Страница 32: ...boot BONDING_MASTER yes BONDING_MODULE_OPTS mode active backup miimon 100 primary ib0 updelay 0 downdelay 0 BONDING_SLAVE0 ib0 BONDING_SLAVE1 ib1 MTU 65520 The following is an example for ib0 slave Th...

Страница 33: ...ify that IB bonding is configured cat proc net bonding bond0 ifconfig Example of cat proc net bonding bond0 output cat proc net bonding bond0 Ethernet Channel Bonding Driver v3 2 3 December 6 2007 Bon...

Страница 34: ...FE 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 UP BROADCAST RUNNING SLAVE MULTICAST MTU 65520 Metric 1 RX packets 118938033 errors 0 dropped 0 overruns 0 frame 0 TX packets 118938027 errors 0 dropped...

Страница 35: ...node that acts as the subnet manager Toenable OpenSM the iba_config command can be used or the chkconfig command as a root user can be used on the node where it will be run The chkconfig command to e...

Страница 36: ...n to correctly build a path record between two nodes The Distributed Subnet Administration SA solves this problem by allowing each node to locally replicate the path records needed to reach the other...

Страница 37: ...do not match SIDs in the Distributed SA s database will be ignored Configuring the Distributed SA In order to absolutely minimize the number of queries made by the Distributed SA it is important to co...

Страница 38: ...to limit how much IB bandwidth MPI applications are permitted to consume In that case they may re configure the QLogic Fabric Manager turning off the Default Virtual Fabric and replacing it with seve...

Страница 39: ...l Fabric As a result the Distributed SA sees two different virtual fabrics that match its configuration file In Figure 3 6 the person administering the fabric has created two different Virtual Fabrics...

Страница 40: ...a last resort Stored SIDs are only mapped to the default virtual fabric if they do not match any other Virtual Fabrics Thus in the first example Figure 3 6 the Distributed SA will assign all the SIDs...

Страница 41: ...rtual Fabrics with Unique Numeric Indexes In Figure 3 8 the Distributed SA assigns all overlapping SIDs to the PSM_MPI fabric because it has the lowest Index Distributed SA Configuration File The Dist...

Страница 42: ...0x0a3 SID 0x1a1 SID 0x1a2 SID 0x1a3 SID 0x2a1 SID 0x2a2 SID 0x2a3 ScanFrequency Periodically the Distributed SA will completely re synchronize its database This also occurs if the Fabric Manager is re...

Страница 43: ...Errors Errors will be reported but nothing else Includes Dbg 1 and Dbg 2 Dbg 4 Warnings Errors and warnings will be reported Includes Dbg 3 Dbg 5 Normal Some normal events will be reported along with...

Страница 44: ...need to change any file on the hosts To ensure that the driver on this host uses 2K MTU add the following options line as a root user in to the configuration file options ib_qib ibmtu 4 Table 3 1 show...

Страница 45: ...TU_firmware emfw for the 9024 EM This has the 4K MTU default for use on fabrics where 4K MTU is required If 4K MTU support is not required then use the 4 2 2 0 2 DDR emfw file for DDR externally manag...

Страница 46: ...e release change driver options or do manual testing QLogic recommends using etc init d openibd to stop stat and restart the ib_qib driver For using the command line to stop start and restart as a roo...

Страница 47: ...ma_ iw_ xargs modprobe r ib_qib Driver Filesystem The ib_qib driver supplies a filesystem for exporting certain binary statistics to user applications By default this filesystem is mounted in the ipat...

Страница 48: ...The flash file is an interface for internal diagnostic commands The file counter_names provides the names associated with each of the counters in the binary port counters files and the file driver_st...

Страница 49: ...vices using the information provided in the following sections Systems in General With Either Intel or AMD CPUs For best performance on dual port HCAs on which only one port is active the module param...

Страница 50: ...are not necessary On all systems the qib driver behaves as if the following parameters were set rcvhdrcnt 4096 If you run a script such as the following for x in sys module ib_qib parameters do echo...

Страница 51: ...ore node then 13 is more than enough PSM contexts to run an MPI process on each core without making use of context sharing An example ib_qib options line in the modprobe conf file for this 12 core nod...

Страница 52: ...bytes AMD Interlagos CPU Systems With AMD Interlagos Opteron 6200 Series CPU systems better performance will be obtained if on single HCA systems the HCA is put in a PCIe slot closest to Socket number...

Страница 53: ...utput will read MaxPayload 256 bytes MaxReadReq 4096 bytes If you run a script such as the following for x in sys module ib_qib parameters do echo basename x cat x done Then in the list of qib paramet...

Страница 54: ...for unknown reason 3d on CPU 0 After this happens you may also see the following message in the syslog Mth dd hh mm ss st2019 kernel ib_qib 0000 0a 00 0 infinipath0 Fatal Hardware Error no longer usab...

Страница 55: ...th the new syntax are listed below Per unit parameters singleport Use only IB port 1 more per port buffer space cfgctxts Set max number of contexts to use pcie_caps Max PCIe tuning MaxPayload MaxReadR...

Страница 56: ...nit 0 and 16 on unit 1 cfgctxts 0 10 1 16 A user can identify HCAs and correlate them to system unit numbers by using the b option beacon mode option to the ipath_control script Issuing the following...

Страница 57: ...s this feature with the driver allocating memory on the NUMA node closest to the HCA recv_queue_size Tuning Related to NAKs The Receiver Not Ready Negative Acknowledgement RNR NAKs can slow IPoIB down...

Страница 58: ...will prompt the user for input on some of the settings and actions Table 3 3 list the checks the tool performs on the system on which it is run Table 3 3 Checks Preformed by ipath_perf_tuning Tool Che...

Страница 59: ...objective is to improve IB verbs communications while maintaining good MPI performance OPTIONS Table 3 4 list the options for the ipath_perf_tuning tool and describes each option cstates Check whether...

Страница 60: ...ese test do not provide a guaranteed universal performance gain and therefore changing driver parameters associated with them requires user approval Other tests where the tool can make a safe determin...

Страница 61: ...r etc modprobe d ib_qib conf RHEL prior to 6 0 etc modprobe conf SLES etc modprobe conf local Homogeneous Nodes To minimize management problems the compute nodes of the cluster should have very simila...

Страница 62: ...mechanism See Appendix F Write Combining for more information Check the PCIe bus width If slots have a smaller electrical width than mechanical width lower than expected performance may occur Use thi...

Страница 63: ...y running on a general Linux computer Following are several groups constituting a minimal necessary set of services These are all services controlled by chkconfig To see the list of services that are...

Страница 64: ...keys must be distributed and stored on all the compute nodes so that connections to the remote machines can be established without supplying a password You or your administrator must set up the ssh k...

Страница 65: ...p fe Root or superuser access is required on ip fe and on each node to configure ssh ssh including the host s key has already been configured on the system ip fe See the sshd and ssh keygen man pages...

Страница 66: ...bGPcrVlSjuVps fWEju64FTqKEetA8l8QEgAAAIBNtPDDwdmXRvDyc0gvAm6lPOIsRLmgmdgKXT GOZUZ0zwxSL7GP1nEyFk9wAxCrXv3xPKxQaezQKs KL95FouJvJ4qrSxxHdd1 NYNR0DavEBVQgCaspgWvWQ8cL 0aUQmTbggLrtD9zETVU5PCgRlQL6I3Y5sCCH...

Страница 67: ...en t rsa 2 Enter a passphrase for your key pair when prompted Note that the key agent does not survive X11 logout or system reboot ssh add 3 The following command tells ssh that your key pair should l...

Страница 68: ...described in the following paragraph MPI jobs that use more than 10 processes per node may encounter an ssh throttling mechanism that limits the amount of concurrent per node connections to 10 If you...

Страница 69: ...orrectly See iba_opp_query on page G 4 for detailed usage information iba_opp_query slid 0x31 dlid 0x75 sid 0x107 Query Parameters resv1 0x0000000000000107 dgid sgid dlid 0x75 slid 0x31 hop 0x0 flow 0...

Страница 70: ...x0 resv2 0x0 resv3 0x0 ibstatus Another useful program is ibstatus that reports on the status of the local HCAs Sample usage and output are as follows ibstatus Infiniband device qib0 port 1 status def...

Страница 71: ...4096 5 active_mtu 4096 5 sm_lid 1 port_lid 31 port_lmc 0x00 ipath_checkout ipath_checkout is a bash script that verifies that the installation is correct and that all the nodes of the network are func...

Страница 72: ...3 InfiniBand Cluster Setup and Administration Checking Cluster and Software Status 3 48 IB0054606 02 A...

Страница 73: ...and MVAPICH2 version 1 7 These MPIs are offered in versions built with the high performance Performance Scaled Messaging PSM interface and versions built run over IB Verbs There are also the commercia...

Страница 74: ...installed Setup When using the mpi selector tool the necessary PATH and LD_LIBRARY_PATH setup is done When not using the mpi selector tool put the Open MPI installation directory in the PATH by adding...

Страница 75: ...ains the host names of the nodes in your cluster that run the examples with one host name per line Name this file mpihosts The contents can be in the following format More details on the mpihosts file...

Страница 76: ...munication to self mca btl openib self The following command disables PSM transport mca mtl psm In these commands btl stands for byte transport layer and mtl for matching transport layer PSM transport...

Страница 77: ...dit a Makefile to achieve this result adding lines similar to CC mpicc F77 mpif77 F90 mpif90 CXX mpicxx In some cases the configuration process may specify the linker QLogic recommends that the linker...

Страница 78: ...emaining options to the mpicxx script the options to the compiler in question and the names of the files that it operates Also use mpif77 mpif90 or mpif95 for linking otherwise true may have the wrong...

Страница 79: ...command line options are used cc gcc the command line variable is used When both the compiler and linker variables are set and they do not match for the compiler you are using the MPI program may fai...

Страница 80: ...r Messages on page 4 11 There are multiple ways of specifying how processes are allocated You can use the mpihosts file the np and ppn options with mpirun and the MPI_NPROCS and PSM_SHAREDCONTEXTS_MAX...

Страница 81: ...to satisfy the job requirement and try to give a context to each process When context sharing is enabled on a system with multiple QLogic IB adapter boards units and the IPATH_UNIT environment variabl...

Страница 82: ...a per node setting or some level of coordination with the job scheduler with setting the environment variable should be used The number of contexts can be explicitly configured with the cfgctxts modul...

Страница 83: ...tions benchmarks add usr mpi gcc openmpi 1 4 3 qlc tests osu_benchmarks 3 1 1 to your PATH or if you installed the MPI in another location add MPI_HOME tests osu_benchmarks 3 1 1 to your PATH To enabl...

Страница 84: ...ehavior than MVAPICH or the no longer supported QLogic MPI In the second format process_count can be different for each host and is normally the number of available processors on the node When not spe...

Страница 85: ...e http www open mpi org faq category running mpirun scheduling Using Open MPI s mpirun The script mpirun is a front end program that starts a parallel MPI job on a set of nodes in an IB cluster mpirun...

Страница 86: ...pihosts file Typically the number of node programs should not be larger than the number of processor cores at least not for compute bound programs This option specifies the number of processes to spaw...

Страница 87: ...d environments batch scheduled environments typically copy the current environment to the execution of remote jobs so if the current environment has PATH and or LD_LIBRARY_PATH set properly the remote...

Страница 88: ...es The prefix option is not sufficient if the installation paths on the remote node are different than the local node for example if lib is used on the local node but lib64 is used on the remote node...

Страница 89: ...ngle copy of foo an allocated node mpirun mca btl self np 1 foo Tells Open MPI to use the self BTL and to run a single copy of foo an allocated node The mca switch can be used multiple times to specif...

Страница 90: ...OpenMP run time library Use this variable to adjust the split between MPI processes and OpenMP threads Usually the number of MPI processes per node times the number of OpenMP threads will be set to ma...

Страница 91: ...it By default IPATH_UNIT is unset and contexts from all configured units are made available to MPI jobs in round robin order Default Unset IPATH_HCA_SELECTION_ALG This variable provides user level sup...

Страница 92: ...og If the link is down when the job starts and you want the job to continue blocking until the link comes up use the t 1 option LD_LIBRARY_PATH This variable specifies the path to the run time library...

Страница 93: ...cutable is executed as usual using mpirun but typically only one MPI process is run per node and the OpenMP library will create additional threads to utilize all CPUs on that node If there are suffici...

Страница 94: ...error codes Using Debuggers See http www open mpi org faq category debugging for details on debugging with Open MPI NOTE With Open MPI and other PSM enabled MPIs you will typically want to turn off PS...

Страница 95: ...bugging MPI Programs IB0054606 02 A 4 23 NOTE The TotalView debugger can be used with the Open MPI supplied in this release Consult the TotalView documentation for more information http www open mpi o...

Страница 96: ...4 Running MPI on QLogic Adapters Debugging MPI Programs 4 24 IB0054606 02 A...

Страница 97: ...5 5 Table 5 1 Other Supported MPI Implementations MPI Implementation Runs Over Compiled With Comments Open MPI 1 4 3 PSM Verbs GCC Intel PGI Provides some MPI 2 functionality one sided operations and...

Страница 98: ...s will also have qlc appended after the MPI version number For example usr mpi gcc openmpi VERSION qlc If a prefixed installation location is used usr is replaced by prefix The following examples assu...

Страница 99: ...er is also available MVAPICH can be managed with the mpi selector utility as described in Managing MVAPICH and MVAPICH2 with the mpi selector Utility on page 5 5 Compiling MVAPICH Applications As with...

Страница 100: ...VAPICH2 that runs over Verbs and is pre compiled with the GNU compiler is also available MVAPICH2 can be managed with the mpi selector utility as described in Managing MVAPICH and MVAPICH2 with the mp...

Страница 101: ...pdf Managing MVAPICH and MVAPICH2 with the mpi selector Utility When multiple MPI implementations have been installed on the cluster you can use the mpi selector to switch between them The MPIs that...

Страница 102: ...r information on setting the run time library path Platform MPI 8 Platform MPI 8 formerly HP MPI is a high performance production quality implementation of the Message Passing Interface MPI with full...

Страница 103: ...e mpirun command running with four processes over PSM mpirun np 4 hostfile mpihosts PSM mpi_app_name To run over IB Verbs type mpirun np 4 hostfile mpihosts IBV mpi_app_name To run over TCP which coul...

Страница 104: ...r to psm X X libtmip_psm so Comments OK Intel MPI can also be run over uDAPL which uses IB Verbs uDAPL is the user mode version of the Direct Access Provider Library DAPL and is provided as a part of...

Страница 105: ...and ofa v2 ib0 u2 0 nonthreadsafe default libdaplofa so 2 dapl 2 0 ib0 0 3 On every node type the following command as a root user modprobe rdma_ucm To ensure that the module is loaded when the drive...

Страница 106: ...ifort the Intel compilers must be installed and resolvable from the user s environment Running Intel MPI Applications Here is an example of a simple mpirun command running with four processes mpirun n...

Страница 107: ...rdma OpenIB cma uDAPL 2 0 genv I_MPI_DEVICE rdma ofa v2 ib To help with debugging you can add this option to the Intel mpirun command TMI genv TMI_DEBUG 1 uDAPL genv I_MPI_DEBUG 2 Further Information...

Страница 108: ...e MVAPICH defaults to an IB MTU size of 1024 bytes This can be over ridden by setting an environment variable export VIADEV_DEFAULT_MTU MTU2048 Valid values are MTU256 MTU512 MTU1024 MTU2048 and MTU40...

Страница 109: ...unrelated to the standard System V Shared Memory API provided by UNIX operating systems Interoperability QLogic SHMEM depends on the Performance Scaled Messaging PSM protocol layer implemented as a u...

Страница 110: ...intel mvapich2 1 7 qlc usr mpi pgi mvapich2 1 7 qlc The qlc suffix denotes that this is the QLogic PSM version It is recommended that you match the compiler used to build the MPI implementation with t...

Страница 111: ...pi usr shmem qlogic include QLogic recommends that usr shmem qlogic bin is added onto your PATH If it is not on your PATH then you will need to give full pathnamescd to find the shmemrun and shmemcc w...

Страница 112: ...to specify the SHMEM include directory the SHMEM library directory and to appropriately link in the SHMEM library The shmemcc script automatically determines the correct directories by finding them r...

Страница 113: ...ication binaries will be portable across different implementations of the QLogic SHMEM library including portability over different underlying MPIs Running SHMEM Programs Using shmemrun The shmemrun s...

Страница 114: ...nd the options will automatically be remapped as required for the actual mpirun This makes it possible to write scripts that call shmemrun without exposing these details of the underlying mpirun comma...

Страница 115: ...N environment variable Alternatively it is possible to write hybrid SHMEM MPI programs that use features from both the SHMEM and MPI libraries These programs must call shmem_init to initialize the SHM...

Страница 116: ...elow are various options for integration of the QLogic SHMEM and slurm Full Integration This approach fully integrates QLogic SHMEM start up into slurm and is available when running over MVAPICH2 The...

Страница 117: ...te options Note that ssh rsh will be used for starting processes not slurm Sizing Global Shared Memory SHMEM provides shmalloc shrealloc and shfree calls to allocate and release memory using a symmetr...

Страница 118: ...y for example in actual use If a SHMEM application program runs out of global shared memory increase the value of SHMEM_SHMALLOC_MAX_SIZE The value of SHMEM_SHMALLOC_INIT_SIZE can also be changed to p...

Страница 119: ...rations As long as there is sufficient physical memory for the program the following steps can be used to solve local shared memory allocation problems Check for low ulimits on memory ulimit l max loc...

Страница 120: ...The progress thread is provided by PSM and is scheduled at a relatively low frequency typically 10 to 100 times a second This thread will cause independent SHMEM progress where required both on the i...

Страница 121: ...ve progress mode will typically be used in the following circumstances For applications that use a polling idiom that is incompatible with the active progress mode and where the application programmer...

Страница 122: ...oint for the long get protocol 0 means unlimited SHMEM_PUT_FRAG_LIMIT 4096 Maximum number of outstanding put fragments for this end point for the short put protocol 0 means unlimited Each short put fr...

Страница 123: ...ehavior for the QLogic SHMEM implementation SHMEM_PUT_REPLY_COMBINING_COUNT 8 Number of consecutive put replies on a flow to combine together into a single reply Table 6 2 shmemrun Environment Variabl...

Страница 124: ...this ordering is guaranteed shmem_quiet This function waits for remote completion of all puts issued by this PE prior to the quiet operation Therefore once the quiet operation returns it is guarantee...

Страница 125: ...ry call However performance will typically be substantially improved by using the SHMEM wait operation instead shmem_stack is implemented as a no op since this is a distributed memory cluster architec...

Страница 126: ...em_init start_pes my_pe _my_pe shmem_my_pe num_pes _num_pes shmem_n_pes Symmetric heap shmalloc shmemalign shfree shrealloc Contiguous Put Operations shmem_short_p shmem_int_p shmem_long_p shmem_float...

Страница 127: ..._int_put_nb shmem_long_put_nb shmem_longdouble_put_nb shmem_longlong_put_nb shmem_put_nb shmem_put32_nb shmem_put64_nb shmem_put128_nb shmem_putmem_nb shmem_short_put_nb Strided Put Operations shmem_d...

Страница 128: ..._quiet shmem_wait_nb shmem_test_nb shmem_poll_nb same as shmem_test_nb provided for compatibility Contiguous Get Operations shmem_short_g shmem_int_g shmem_long_g shmem_float_g shmem_double_g shmem_lo...

Страница 129: ...hmem_long_get_nb shmem_longdouble_get_nb shmem_longlong_get_nb shmem_short_get_nb shmem_get_nb shmem_get32_nb shmem_get64_nb shmem_get128_nb shmem_getmem_nb Strided Get Operations shmem_double_iget sh...

Страница 130: ...adcast64 Concatenation shmem_collect shmem_collect32 shmem_collect64 shmem_fcollect shmem_fcollect32 shmem_fcollect64 Synchronization operations shmem_int_wait shmem_long_wait shmem_longlong_wait shme...

Страница 131: ...mem_long_cswap shmem_longlong_cswap shmem_short_mswap shmem_int_mswap shmem_long_mswap shmem_longlong_mswap shmem_short_inc shmem_int_inc shmem_long_inc shmem_longlong_inc shmem_short_add shmem_int_ad...

Страница 132: ...shmem_short_or_to_all shmem_int_xor_to_all shmem_long_xor_to_all shmem_longlong_xor_to_all shmem_short_xor_to_all shmem_double_min_to_all shmem_float_min_to_all shmem_int_min_to_all shmem_long_min_to_...

Страница 133: ...sum_to_all shmem_longlong_sum_to_all shmem_short_sum_to_all shmem_complexd_prod_to_all complex collectives are not implemented shmem_complexf_prod_to_all complex collectives are not implemented shmem_...

Страница 134: ...ts PE for accessibility shmem_addr_accessible test address on PE for accessibility Cache Operations for compatibility shmem_clear_cache_inv implemented as a no op shmem_clear_cache_line_inv implemente...

Страница 135: ...e processes equally divided between them The processes are split up into pairs with one from each pair on either host and each pair is loaded with the desired traffic pattern The benchmark automatical...

Страница 136: ...specified in bytes default 8 Options See Table 6 5 b INT batch size number of concurrent operations default 64 f force order for bifurcation of PEs based on rank order h displays the help page l INT...

Страница 137: ...ndow this is the default q for blocking puts use quiet every window r use ring pattern default is random s enable communication to self t FLOAT if the loop count is not given run the test for this man...

Страница 138: ...e non pipelined mode for NB ops default pipelined o OP choose OP from put or putnb p INTEGER offset for all to all schedule default 1 usually set to ppn r randomize all to all schedule s enable commun...

Страница 139: ...rograms IB0054606 02 A 6 31 Table 6 8 QLogic SHMEM reduce benchmark options Option Description b INTEGER number of barriers between reduces default 0 h displays the help page i INTEGER K outer iterati...

Страница 140: ...6 SHMEM Description and Configuration SHMEM Benchmark Programs 6 32 IB0054606 02 A...

Страница 141: ...ate deactivate these features Other MPIs will require use of environment variables to leverage these capabilities With MPI applications the environment variables need to be propagated across all nodes...

Страница 142: ...will automatically obtain the SL and Pkey to use for the vFabric from the QLogic Fabric Manager via path record queries Using SL and PKeys SL and Pkeys can be specified natively for Open MPI For othe...

Страница 143: ...SA The SIDs configured in the QLogic Fabric Manager configuration file should also be provided to the Distributed SA for correct operation Service ID can be specified natively for Open MPI For other M...

Страница 144: ...mapping for any given port however QLogic 7300 series adapters exports the SL2VL mapping via sysfs files These files are used by PSM to implement the SL2VL tables automatically The SL2VL tables are pe...

Страница 145: ...le multiple DLID entries in the port forwarding table that could map to different egress ports Dispersive routing as implemented in the PSM attempts to avoid congestion hotspots described above by spr...

Страница 146: ...described above and a 16 process MPI application that spans these nodes 8 process per node Then Each MPI process is automatically bound to a given CPU core numbered between 0 7 PSM does this at startu...

Страница 147: ...a single process on Node B only one path will be used across all processes Static_Base The only path that is used is the base path SLID DLID between nodes regardless of the LMC of the fabric or the n...

Страница 148: ...8 Dispersive Routing 8 4 IB0054606 02 A...

Страница 149: ...LE7342 adapter for the node The following software is included with the QLogic OFED installation software package gPXE boot image patch for DHCP server tool to install gPXE boot image in EPROM of card...

Страница 150: ...ter GUID The dhcpd on the existing DHCP server may need to be patched This patch will be provided via the gPXE rpm installation 3 Write the ROM image to the IB adapter This only needs to be done once...

Страница 151: ...client identifier value such that the DHCP server will grant the same IP address to any client that conveys this client identifier 2 Unpack the latest downloaded DHCP server tar zxf dhcp release tar...

Страница 152: ...DHCP server The following is the sample etc dhcpd conf file that specifies the HCA GUID for the hardware address DHCP Server Configuration file see usr share doc dhcp dhcpd conf sample ddns update st...

Страница 153: ...diskless booting with an http boot server Boot Server Setup Configure the boot server for your site NOTE The dhcpd and apache configuration files referenced in this example are included as examples an...

Страница 154: ...of the images conf file Alias images vault images Directory vault images AllowOverride All Options Indexes FollowSymLinks Order allow deny Allow from all Directory The following is an example of the...

Страница 155: ...tp driverdownloads qlogic com QLogicDriverDownloads_UI default aspx a If vault images initrd img file is already present on the server machine back it up For example cp a vault images initrd img vault...

Страница 156: ...machine of the same type hardware configuration and BIOS settings start with a known path to get the system commands PATH sbin usr sbin bin usr bin PATH start from a copy of the current initd image mk...

Страница 157: ...e qib driver will require the dca module if modinfo F depends ib_qib grep q dca then cp find lib modules uname r name dca ko lib ib dcacmd sbin insmod lib ib dca ko else dcacmd fi IB requires loading...

Страница 158: ...em in order to use it for NFS etc n Now build the commands to load the additional modules We add them just after the last existing insmod command so all other dependences will be resolved You can chan...

Страница 159: ...ho finished loading IB modules End of IB module block EOF first get line number where we append after last insmod if any otherwse at start line egrep n insmod init sed n s p if line then line 1 fi sed...

Страница 160: ...ges initrd kern img c Run the usr share infinipath gPXE gpxe qib modify initrd script to create the initrd img file At this stage the initrd img file is ready and located at the location where the DHC...

Страница 161: ...RVER SERVER_NAME port baseurl baseURL selfurl baseurl _SERVER REQUEST_URI dirurl baseurl dirname _SERVER SCRIPT_NAME kver 2 6 18 164 11 1 el5 echo EOF gpxe initrd images initrd img kernel kernels vmli...

Страница 162: ...HCA is finished and the boot image is ready 3 Verify system boots off of the kernel image on the boot server The best way to do this is to boot into a different kernel from the one installed on the ha...

Страница 163: ...ing the examples in Step 2 of Boot Server Setup and place them in the etc httpd conf d directory 6 Edit etc dhcpd conf file to boot the clients using HTTP filename http 172 26 32 9 images uniboot unib...

Страница 164: ...9 gPXE HTTP Boot Setup 9 16 IB0054606 02 A...

Страница 165: ...lc and that mpi selector is used to choose this Open MPI version as the MPI to be used The following examples are intended to show only the syntax for invoking these programs and the meaning of the ou...

Страница 166: ...for all operations Half the time interval observed by the rank zero process for each exchange is a measure of the latency for messages of that size as previously defined The program uses a loop execut...

Страница 167: ...Latency Test v3 1 1 Size Latency us 0 1 67 1 1 68 2 1 69 4 1 68 8 1 68 16 1 93 32 1 92 64 1 92 128 1 99 256 2 12 512 2 38 1024 2 74 2048 3 52 4096 4 59 8192 6 52 16384 9 98 32768 17 65 65536 52 11 131...

Страница 168: ...the osu_latency code except in this case the originator of the messages pumps a number of them 64 in the installed version in succession using the non blocking MPI_I send function while the receiving...

Страница 169: ...34 55 32 68 89 64 137 87 128 265 80 256 480 19 512 843 70 1024 1353 48 2048 1984 11 4096 2152 61 8192 2249 00 16384 2680 75 32768 2905 83 65536 3170 05 131072 3224 15 262144 3241 35 524288 3270 21 10...

Страница 170: ...esses Each of the sending processes sends a fixed number of messages the window size back to back to the paired receiving process before waiting for a reply from the receiver This process is repeated...

Страница 171: ...79 96 256 2110 22 8243066 36 512 2353 17 4596038 46 1024 2495 88 2437386 38 2048 2573 99 1256833 08 4096 2567 88 626923 21 8192 2757 54 336613 42 16384 3283 94 200435 90 32768 3291 54 100449 84 65536...

Страница 172: ...ging Rate Microbenchmarks A 8 IB0054606 02 A N 2 is dynamically calculated at the end of the run You can use the b option to get a bidirectional message rate and bandwidth results Scalability has been...

Страница 173: ...888 4 100 075479 25018869 818990 8 200 115037 25014379 610716 16 284 475601 17779725 040265 32 568 950239 17779694 953511 64 1137 899392 17779677 998115 128 1758 183987 13735812 394705 256 2116 159352...

Страница 174: ...Bandwidth MB s Messages s 1 34 572819 34572819 324348 2 68 984920 34492459 942272 4 137 870850 34467712 532016 8 274 914966 34364370 730843 16 438 182185 27386386 585309 32 871 077525 27221172 671073...

Страница 175: ...nchmark 3 Messaging Rate Microbenchmarks IB0054606 02 A A 11 The higher peak bi directional messaging rate of 34 6 million messages per second at the 1 byte size compared to 25 million messages sec wh...

Страница 176: ...A Benchmark Programs Benchmark 3 Messaging Rate Microbenchmarks A 12 IB0054606 02 A...

Страница 177: ...adapter port through which the host communicates with a SRP target device e g a Fibre Channel disk array via a SRP target port A SRP Target Port is an IOC of the VIO hardware In the context of VIO ha...

Страница 178: ...C only IB attached storage will use their own mechanism as maps are not necessary A SRP Adapter is a collection of SRP sessions This collection is then presented to the Linux kernel as if those sessio...

Страница 179: ...nsion 2 end The session command has two parts the part that specifies the SRP initiator and the part that specifies the SRP target port The SRP initiator contains two parts the SRP initiator port and...

Страница 180: ...e in this manner other devices have their own naming method To specify the host IB port to use the user can either specify the port GUID of the local IB port or simply use the index numbers of the car...

Страница 181: ...0 name SRP T10 0000000000000001 id 0x0000494353535250 service 1 name SRP T10 0000000000000002 id 0x0000494353535250 service 2 name SRP T10 0000000000000003 id 0x0000494353535250 service 3 name SRP T1...

Страница 182: ...ZE 320 SRP IU SG SIZE 15 SRP IO CLASS 0xff00 service 0 name SRP T10 0000000000000001 id 0x0000494353535250 service 1 name SRP T10 0000000000000002 id 0x0000494353535250 service 2 name SRP T10 00000000...

Страница 183: ...SRP T10 0000000000000001 id 0x0000494353535250 session begin card 0 port 1 portGuid 0x0002c9030000110d initiatorExtension 1 targetIOCGuid 0x00066a01e0000149 targetIOCProfileIdString FVIC in Chassis 0x...

Страница 184: ...r side by card index card 0 Specifies first HCA port 1 Specifies first port targetIOCGuid 0x00066013800016C end Specifying an SRP Initiator Port of Session by Port GUID The following example specifies...

Страница 185: ...method if the port GUIDs are changed they must also be changed in the configuration file NOTE When specifying the targetIOCProfileIdString the string is case and format sensitive The easiest way to g...

Страница 186: ...targetIOCProfileIdString FVIC in Chassis 0x00066A005000010E Slot 1 IOC 1 end Specifying an Adapter An adapter is a collection of sessions This collection is presented to the Linux kernel as if the co...

Страница 187: ...er is configured with only one session and that session fails all SCSI I Os on that session will fail and access to SCSI target devices will be lost While the qlgc_srp module will attempt to recover t...

Страница 188: ...e configuration that uses multiple sessions and adapters session begin card 0 port 2 targetIOCProfileIdString FVIC in Chassis 0x00066A005000011D Slot 1 IOC 1 initiatorExtension 3 end adapter begin des...

Страница 189: ...pter Following is a list of the different type of failover scenarios Failing over from one SRP initiator port to another Failing over from a port on the VIO hardware card to another port on the VIO ha...

Страница 190: ...x0000494353535250 session begin card 0 port 1 portGuid 0x0002c903000010f1 initiatorExtension 1 targetIOCGuid 0x00066a01e0000149 targetIOCProfileIdString BC2FC in Chassis 0x0000000000000000 Slot 6 Ioc...

Страница 191: ...om a port on the VIO hardware card to another port on the VIO hardware card session begin card 0 InfiniServ HCA card number port 1 InfiniServ HCA port number targetIOCProfileIdString FVIC in Chassis F...

Страница 192: ...ion File 3 Failing over from a port on a VIO hardware card to a port on a different VIO hardware card within the same Virtual I O chassis session begin card 0 InfiniServ HCA card number port 1 InfiniS...

Страница 193: ...erent Virtual I O chassis session begin card 0 InfiniServ HCA card number port 1 InfiniServ HCA port number targetIOCProfileIdString FVIC in Chassis FRUChassisGUID1 Slot1 IOC initiatorExtension 1 end...

Страница 194: ...first example traffic going to any Fibre Channel Target Device where both ports of the VIO hardware card have a valid map are split between the two ports of the VIO hardware card If one of the VIO har...

Страница 195: ...f one of the sessions goes down due to an IB cable failure or an FC cable failure all traffic will begin using the other session session begin card 0 port 2 targetIOCProfileIdString FVIC in Chassis 0x...

Страница 196: ...sions If there is a failure in one of the sessions e g one of the VIO hardware cards is rebooted traffic will begin using the other session session begin card 0 port 2 targetIOCProfileIdString FVIC in...

Страница 197: ...P IOC Profile Native IB Storage SRP Driver SRP IOC GUID 0x00066a01dd000021 SRP IU SIZE 320 SRP IU SG SIZE 15 SRP IO CLASS 0xff00 service 0 name SRP T10 0000000000000001 id 0x0000494353535250 service 1...

Страница 198: ...00000066a01e0000149 targetExtension 0x0000000000000001 SID 0x0000494353535250 IOClass 0x0100 end session begin card 0 port 2 portGuid 0x0002c903000010f2 initiatorExtension 1 targetIOCGuid 0x00066a01e0...

Страница 199: ...a11dd000021 HCA 0 Port 2 0x0002c9020026041e Target Port GID 0xfe8000000000000000066a11dd000021 qlgc_srp cfg session begin targetIOCGuid 0x0002C90200400098 targetExtension 0x0002C90200400098 end adapte...

Страница 200: ...e it automatically loaded 2 Discover the SRP devices on your fabric by running this command as a root user ibsrpdm In the output look for lines similar to these GUID 0002c90200402c04 ID LSI Storage Sy...

Страница 201: ...the target you want and echo it into the add_target file echo id_ext 21000001ff040bf6 ioc_guid 21000001ff040bf6 dgid f e8000000000000021000001ff040bf6 pkey ffff service_id f60b 04ff01000021 initiator...

Страница 202: ...B SRP Configuration OFED SRP Configuration B 26 IB0054606 02 A Notes...

Страница 203: ...rocess and file clean up after batch MPI PSM jobs have completed Clean Termination of MPI Processes and Clean up PSM Shared Memory Files Clean Termination of MPI Processes The InfiniPath software norm...

Страница 204: ...irectory The file is owned by the user and in permission rwx it can be removed either by the user or by root PSM relies on the MPI implementation to cleanup after abnormal job termination In cases whe...

Страница 205: ...sh files bin ls dev shm psm_shm 2 dev null for file in files do sbin fuser file dev null 2 1 if ne 0 then bin rm file dev null 2 1 fi done When the system is idle the administrators can remove all of...

Страница 206: ...C Integration with a Batch Queuing System Clean up PSM Shared Memory Files C 4 IB0054606 02 A...

Страница 207: ...Software Installation Guide Using LEDs to Check the State of the Adapter The LEDs function as link and data indicators once the InfiniPath software has been installed the driver has been loaded and t...

Страница 208: ...ted and the physical link is up Ready to talk to SM to bring the link fully up If this state persists the SM may be missing or the link may not be configured Use ipath_control i to verify the software...

Страница 209: ...efer to the QLogic Fabric Software Installation Guide for more information InfiniPath Interrupts Not Working The InfiniPath driver cannot configure the InfiniPath link to a usable state unless interru...

Страница 210: ...depending on your distribution If no output is displayed check that ACPI is enabled in your BIOS settings To track down other initialization failures see InfiniPath ib_qib Initialization Failure on pa...

Страница 211: ...g 1 node processes If this error appears check to see if the InfiniPath driver is loaded by typing lsmod grep ib_qib If no output is displayed the driver did not load for some reason In this case try...

Страница 212: ...nected the switch is down SM is not running or that a hardware error occurred OpenFabrics and InfiniPath Issues The following sections cover issues related to OpenFabrics including Subnet Managers and...

Страница 213: ...tes Connection Refused errors if it is loaded before IPoIB has been loaded and configured To solve the problem load and configure IPoIB first Set IBPATH for OpenFabrics Scripts The environment variabl...

Страница 214: ...ffff service_id f60b04ff01000 021 sys class infiniband_srp srp ipath0 1 add_target Outdated ipath_ether Configuration Setup Generates Error Ethernet emulation ipath_ether has been removed in this rele...

Страница 215: ...ebugging See your switch vendor for more information QLogic recommends using FastFabric to help diagnose this problem If FastFabric is not installed in the fabric there are two diagnostic tools ibhost...

Страница 216: ...onfig irqbalance off etc init d irqbalance stop Next find the IRQ number and bind it to a CPU The IRQ number can be found in one of two ways depending on the system used Both methods are described in...

Страница 217: ...ook at the stats in proc interrupts while the adapter is active to observe which CPU is fielding ib_qib interrupts Immediately change the processor affinity of an IRQ To immediately change the process...

Страница 218: ...ms are described in the following sections Invalid Configuration Warning Open MPI warns about a invalid configuration every time it is run with the following warning WARNING There are more than one ac...

Страница 219: ...are To determine if the logical connection between the IB host and the VIO hardware is correct check the following The correct VirtualNIC driver is running The etc infiniband qlgc_vnic cfg file contai...

Страница 220: ...ng that the qlgc_vnic cfg file contains the correct information Use the following scenarios to verify that the qlgc_vnic cfg file contains a definition for the applicable virtual interface Issue the c...

Страница 221: ...09 max controllers 0x03 controller 1 GUID 00066a0130000001 vendor ID 00066a device ID 000030 IO class 2000 ID Chassis 0x00066A00010003F2 Slot 1 IOC 1 service entries 2 service 0 1000066a00000001 Infin...

Страница 222: ...66a0130000001 dgid fe8000000000000000066a02580000 01 pkey ffff ioc_guid 00066a0230000001 dgid fe8000000000000000066a02580000 01 pkey ffff ioc_guid 00066a0330000001 dgid fe8000000000000000066a02580000...

Страница 223: ...ed to an IB switch For example st139 ibv_devinfo hca_id mlx4_0 fw_ver 2 2 000 node_guid 0002 c903 0000 0f80 sys_image_guid 0002 c903 0000 0f83 vendor_id 0x02c9 vendor_part_id 25418 hw_ver 0xA0 board_i...

Страница 224: ...e for each EIOC configuration file in the following list ls etc sysconfig network scripts ifcfg eioc1 ifcfg eioc2 ifcfg eioc3 ifcfg eioc4 ifcfg eioc5 ifcfg eioc6 Interface does not show up in output o...

Страница 225: ...ADCAST 172 26 63 255 NETMASK 255 255 240 0 NETWORK 172 26 48 0 STARTMODE hotplug TYPE Ethernet Verify the physical connection between the VIO hardware and the Ethernet network If the interface is disp...

Страница 226: ...thernet port If a VIO hardware module can be seen from a host the ib_qlgc_vnic_query s file displays information similar to EVIC in Chassis 0x00066a000300012a Slot 19 Ioc 1 EVIC in Chassis 0x00066a000...

Страница 227: ...UN Depth 16 Max LUN Scan 512 Max IO 131072 128 KB Max Sectors 256 Max SG Depth 33 Session Count 2 No Connect T O 60 Second s Register In Order ON Dev Reqst T O 2 Second s Description SRP Virtual HBA 1...

Страница 228: ...D 0xfe8000000000000000066a0260000165 SRP IOC Profile Chassis 0x00066A0001000481 Slot 1 IOC 2 SRP Target IOClass 0xFF00 SRP Target SID 0x0000494353535250 SRP IPI Guid 0x00066a000100d052 SRP IPI Extnsn...

Страница 229: ...on Rejected state according to var log messages If the session is part of a multi session adapter ib_qlgc_srp_stats shows it in the Connection Rejected state A host displays Connection Failed for Sess...

Страница 230: ...HBA 1 Session Session 1 State Disconnected Source GID 0xfe8000000000000000066a000100d051 Destination GID 0xfe8000000000000000066a0260000165 SRP IOC Profile Chassis 0x00066A0001000481 Slot 1 IOC 1 SRP...

Страница 231: ...d 0x00066a0238000165 SRP TPI Extnsn 0x0000000000000001 Source LID 0x000c Dest LID 0x0004 Completed Sends 0x00000000000001c8 Send Errors 0x0000000000000000 Completed Receives 0x00000000000001c8 Receive...

Страница 232: ...initiatorExtension in the failing Session block of the qlgc_srp cfg file or the adapter port GUID specified in the failing Session block of the qlgc_srp cfg file Additionally make certain that the map...

Страница 233: ...the cable between the storage device and the Fibre Channel switch was pulled the VIO hardware log will display a Connection Lost to NPort Id message The next time the host tries to do an input output...

Страница 234: ...VFx Port GUID is of the form 00066app38iiiiii where pp gives the IOC number 1 or 2 and iiiiiii gives the individual ID number of the VIO hardware so 00066a0138iiiiiii is the port guid of IOC 1 of VIO...

Страница 235: ...rest will be Connected The transition of a session from Connected to Active will not be attempted until that session needs to become Active due to the failure of the previously Active session How doe...

Страница 236: ...r_part_id 25218 hw_ver 0xA0 board_id SS_0000000005 phys_port_cnt 2 port 1 state PORT_ACTIVE 4 max_mtu 2048 4 active_mtu 2048 4 sm_lid 71 port_lid 60 port_lmc 0x00 st106 ibv_devinfo i 2 hca_id mthca0 f...

Страница 237: ...19 Need to determine the SRP driver version Solution To determine the SRP driver version number enter the command modinfo d qlgc srp which returns information similar to the following st159 modinfo d...

Страница 238: ...E ULP Troubleshooting Troubleshooting SRP Issues E 20 IB0054606 02 A...

Страница 239: ...ions for enabling and disabling WC using PAT and MTRR and for verifying that write combining is working PAT and Write Combining The wc_pat parameter is set in etc modprobe conf on Red Hat systems or e...

Страница 240: ...ws the memory to be mapped with fewer MTRRs so that there will be one or more unused MTRRs for the InfiniPath driver Some BIOS do not have the MTRR mapping option It may have a different name dependin...

Страница 241: ...he following command ipath_pkt_test B With write combining enabled the QLE7140 and QLE7240 report in the range of 1150 1500 MBps The QLE7280 reports in the range of 1950 3000 MBps You can also use ipa...

Страница 242: ...F Write Combining Verify Write Combining is Working F 4 IB0054606 02 A Notes...

Страница 243: ...uster environment Use the following items as a checklist for verifying homogeneity A difference in any one of these items in your cluster may cause problems Kernels Distributions Versions of the QLogi...

Страница 244: ...d enables disables ser vices including drivers Can be useful for checking homoge neity dmesg Prints out bootup messages Useful for checking for initializa tion problems iba_opp_query Retrieves path re...

Страница 245: ...adapt ers or using an IB loopback connector tests within a single QLogic IB adapter ipathstatsc Displays driver statistics and hardware counters including performance and error including status count...

Страница 246: ...queries that can be done are much more limited than with iba_saquery In particular it can only find paths that start on the machine where the command is run In other words the source LID or source GI...

Страница 247: ...pkey pkey Partition Key i sid sid Service ID h hca hca The HCA to use Defaults to the first HCA The HCA can be identified by name mthca0 qib1 et cetera or by number 1 2 3 et cetera p port port The por...

Страница 248: ...0x75 slid 0x31 hop 0x0 flow 0x0 tclass 0x0 num_path 0x0 pkey 0x0 qos_class 0x0 sl 0x0 mtu 0x0 rate 0x0 pkt_life 0x0 preference 0x0 resv2 0x0 resv3 0x0 Using HCA qib0 Result resv1 0x0000000000000107 d...

Страница 249: ...will never have to specify which HCA to use This is only relevant in the case where a single node is connected to multiple physical IB fabrics Finally the bottom half of the output shows the result o...

Страница 250: ...id 263 using decimal numbers Note that these queries are the same as the first two only the base of the numbers has changed Query by LID and PKEY iba_opp_query slid 0x31 dlid 0x75 pkey 0x8002 Query by...

Страница 251: ...CIe Gen2 x8 V1 N A YA N A FW 2 9 1000 Image type ConnectX FW Version 2 9 1000 Device ID 26428 Description Node Port1 Port2 Sys image GUIDs 0002c903000ba8e0 0002c903000ba8e1 0002c903000ba8e2 0002c90300...

Страница 252: ...en true mdio_en_port1 0 IB phy_type_port1 XFI phy_type_port2 XFI read_cable_params_port1_en true read_cable_params_port2_en true Polarity eth_tx_lane_polarity_port1 0x0 eth_tx_lane_polarity_port2 0x0...

Страница 253: ..._main_qdr 0x0 port2_sd0_ob_preemp_main_qdr 0x0 port1_sd1_ob_preemp_main_qdr 0x0 port2_sd1_ob_preemp_main_qdr 0x0 port1_sd2_ob_preemp_main_qdr 0x0 port2_sd2_ob_preemp_main_qdr 0x0 port1_sd3_ob_preemp_m...

Страница 254: ...mp_pre 0x8 auto_ddr_option_1 tx_preemp_msb 0x0 auto_ddr_option_1 tx_preemp_post 0x2 auto_ddr_option_1 tx_preemp_main 0x10 auto_ddr_option_1 tx_preemp 0x0 auto_ddr_option_2 tx_preemp_pre 0xa auto_ddr_o...

Страница 255: ...option_7 tx_preemp_msb 0x1 auto_ddr_option_7 tx_preemp_post 0x3 auto_ddr_option_7 tx_preemp_main 0x17 auto_ddr_option_7 tx_preemp 0x0 auto_ddr_option_8 tx_preemp_pre 0xf auto_ddr_option_8 tx_preemp_ms...

Страница 256: ...auto_ddr_option_13 tx_preemp_main 0x5 auto_ddr_option_13 tx_preemp 0x0 auto_ddr_option_14 tx_preemp_pre 0x0 auto_ddr_option_14 tx_preemp_msb 0x0 auto_ddr_option_14 tx_preemp_post 0x0 auto_ddr_option_1...

Страница 257: ...auto_ddr_option_5 rx_equal_offs 0x0 auto_ddr_option_6 rx_equal_offs 0x0 auto_ddr_option_7 rx_equal_offs 0x0 auto_ddr_option_0 rx_muxeq 0x0 auto_ddr_option_1 rx_muxeq 0x0 auto_ddr_option_2 rx_muxeq 0x...

Страница 258: ...sigdet_th 0x1 auto_ddr_option_4 rx_sigdet_th 0x1 auto_ddr_option_5 rx_sigdet_th 0x1 auto_ddr_option_6 rx_sigdet_th 0x1 auto_ddr_option_7 rx_sigdet_th 0x1 auto_ddr_option_0 rx_equalization 0x4 auto_ddr...

Страница 259: ...on_13 rx_muxeq 0x0 auto_ddr_option_13 rx_muxmain 0x1f auto_ddr_option_13 rx_main 0xf auto_ddr_option_13 rx_extra_hs_gain 0x3 auto_ddr_option_13 rx_equalization 0x0 auto_ddr_option_14 rx_muxeq 0x0 auto...

Страница 260: ...OK 0x00001234 0x0000280f 0x0015dc BOOT2 OK 0x00002810 0x000034ef 0x000ce0 Configuration OK 0x000034f0 0x00003533 0x000044 GUID OK 0x00003534 0x0000366b 0x000138 Image Info OK 0x0000366c 0x0000946f 0x...

Страница 261: ...e operation Options H help this message v verbose additional output t target guid guid of target switch in hex format for example 0x00066a00e3001234 h hca HCA number default is first HCA p port port n...

Страница 262: ...update using fileName parameter must be an emfw file fwVerify perform firmware validation validate firmware in primary secondary EEPROMs report which was booted ping test for switch presence reboot r...

Страница 263: ...ides IB packet analysis The snoop_enable variable must be set to 1 enabled in the modprobe conf ib_qib conf file to create snoop devices and capture devices If snoop_enable is set to 0 disable then no...

Страница 264: ...be downloaded from http www wireshark org Intel recommends using version 1 6 2 ibhosts This tool determines if all the hosts in your IB fabric are up and visible to the subnet manager and to each oth...

Страница 265: ...The IB LIDs of the two nodes in this example are determined by using the ipath_control i command on each node The ibtracert tool produces output similar to the following when run as a root user from...

Страница 266: ...0078 a5d2 sys_image_guid 0011 7500 0078 a5d2 vendor_id 0x1175 vendor_part_id 29474 hw_ver 0x2 board_id InfiniPath_QLE7340 phys_port_cnt 1 port 1 state PORT_ACTIVE 4 max_mtu 4096 5 active_mtu 4096 5 s...

Страница 267: ...ident ib_qib ko ib_qib ko ident warning no id keywords in ib_qib ko ipath_checkout The ipath_checkout tool is a bash script that verifies that the installation is correct and that all the nodes of the...

Страница 268: ...y test on every pair of nodes and analyzes the results The options available with ipath_checkout are shown in Table G 2 NOTE The hostnames in the nodefile are Ethernet hostnames not IPv4 addresses To...

Страница 269: ...1 linkcontrol status_str sys class infiniband qib0 device driver version These files are also documented in Table G 4 and Table G 5 Other than the i option this script must be run with root permission...

Страница 270: ...OFED Release x x x Date yyyy mm dd hh mm 0 Version ChipABI 2 0 InfiniPath_QLE7342 InfiniPath1 6 1 SW Compat 2 0 Serial RIB0941C00005 LocalBus PCIe 5000MHz x8 0 1 Status 0xe1 Initted Present IB_link_u...

Страница 271: ...ttings using the BIOS Setup utility For specific instructions follow the hardware documentation that came with your system QLogic also provides a script ipath_mtrr which sets the MTRR registers enabli...

Страница 272: ...lated It is installed from the infinipath RPM It displays both driver statistics and hardware counters including both performance and error including status counters Running ipathstats c 10 for exampl...

Страница 273: ...raffic patterns optionally including oneself and one s local shared memory shm peers It can also be set up with multi dimensional grid traffic patterns this can be parameterized to run rings open 2D g...

Страница 274: ...ries The option qa queries all To query a package that has not yet been installed use the qpl option strings Use the strings command to determine the content of and extract text from a binary file The...

Страница 275: ...rob lems and hardware errors Verify hosts via an Ethernet ping ipath_checkout run 1 hostsfile Verify ssh ipath_checkout run 2 hostsfile Show uname a for all hosts mpirun m hostsfile ppn 1 np numhosts...

Страница 276: ...allhosts mpirun m hostsfile ppn 1 np numhosts nonmpi ipath_control i Verify that the hosts see each other ipath_checkout run 5 hostsfile Check MPI performance ipath_checkout run 7 hostsfile Generate a...

Страница 277: ...Description Initted The driver has loaded and successfully initialized the IBA6110 or IBA7220 ASIC Present The IBA6110 or IBA7220 ASIC has been detected but not initialized unless Initted is also pre...

Страница 278: ...ration template files used by the InfiniPath and OpenFabrics software Table G 6 Status Other Files File Name Contents lid IB LID The address on the IB fabric similar conceptually to an IP address for...

Страница 279: ...d by the modprobe command Also used for creating aliases The PAT write combing option is set here For SLES systems etc infiniband openib conf The primary configuration file for InfiniPath OFED modules...

Страница 280: ...G Commands and Files Summary of Configuration Files G 38 IB0054606 02 A...

Страница 281: ...romio Books for Learning MPI Programming Gropp William Ewing Lusk and Anthony Skjellum Using MPI Second Edition 1999 MIT Press ISBN 0 262 57134 X Gropp William Ewing Lusk and Anthony Skjellum Using MP...

Страница 282: ...orking The Internet Frequently Asked Questions FAQ archives contain an extensive Request for Command RFC section Numerous documents on networking and configuration can be found at http www faqs org rf...

Страница 283: ......

Страница 284: ...Logic logo and the Powered by QLogic logo are registered trademarks of QLogic Corporation InfiniBand is a registered trademark of the InfiniBand Trade Association All other brand and product names are...

Отзывы: