D
–
Troubleshooting
Kernel and Initialization Issues
IB0054606-02 A
D-5
InfiniPath
ib_qib
Initialization Failure
There may be cases where
ib_qib
was not properly initialized. Symptoms of this
may show up in error messages from an MPI job or another program. Here is a
sample command and error message:
$
mpirun -np 2 -m ~/tmp/mbu13 osu_latency
<nodename>:ipath_userinit: assign_port command failed:
Network is down
<nodename>:can’t open /dev/ipath, network down
This will be followed by messages of this type after 60 seconds:
MPIRUN<node_where_started>: 1 rank has not yet exited 60
seconds after rank 0 (node <nodename>) exited without reaching
MPI_Finalize().
MPIRUN<node_where_started>:Waiting at most another 60 seconds
for the remaining ranks to do a clean shutdown before
terminating 1 node processes.
If this error appears, check to see if the InfiniPath driver is loaded by typing:
$
lsmod | grep ib_qib
If no output is displayed, the driver did not load for some reason. In this case, try
the following commands (as root):
#
modprobe -v ib_qib
#
lsmod | grep ib_qib
#
dmesg | grep -i ib_qib | tail -25
The output will indicate whether the driver has loaded. Printing out messages
using
dmesg
may help to locate any problems with
ib_qib
.
If the driver loaded, but MPI or other programs are not working, check to see if
problems were detected during the driver and QLogic hardware initialization with
the command:
$
dmesg | grep -i ib_qib
This command may generate more than one screen of output.
Also, check the link status with the commands:
$
cat /sys/class/infiniband/ipath*/device/status_str
These commands are normally executed by the
ipathbug-helper
script, but
running them separately may help locate the problem.
See also
and
.
Summary of Contents for OFED+ Host
Page 1: ...IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 14: ...xiv IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 22: ...1 Introduction Interoperability 1 4 IB0054606 02 A...
Page 96: ...4 Running MPI on QLogic Adapters Debugging MPI Programs 4 24 IB0054606 02 A...
Page 140: ...6 SHMEM Description and Configuration SHMEM Benchmark Programs 6 32 IB0054606 02 A...
Page 148: ...8 Dispersive Routing 8 4 IB0054606 02 A...
Page 164: ...9 gPXE HTTP Boot Setup 9 16 IB0054606 02 A...
Page 176: ...A Benchmark Programs Benchmark 3 Messaging Rate Microbenchmarks A 12 IB0054606 02 A...
Page 202: ...B SRP Configuration OFED SRP Configuration B 26 IB0054606 02 A Notes...
Page 206: ...C Integration with a Batch Queuing System Clean up PSM Shared Memory Files C 4 IB0054606 02 A...
Page 238: ...E ULP Troubleshooting Troubleshooting SRP Issues E 20 IB0054606 02 A...
Page 242: ...F Write Combining Verify Write Combining is Working F 4 IB0054606 02 A Notes...
Page 280: ...G Commands and Files Summary of Configuration Files G 38 IB0054606 02 A...
Page 283: ......