C
–
Integration with a Batch Queuing System
Clean-up PSM Shared Memory Files
C-2
IB0054606-02 A
This command displays a list of processes using InfiniPath. Additionally, to get all
processes, including
stats
programs,
ipath_sma
, diags, and others, run the
program in this way:
#
/sbin/fuser -v /dev/ipath*
lsof
can also take the same form:
#
lsof /dev/ipath*
The following command terminates all processes using the QLogic interconnect:
#
/sbin/fuser -k /dev/ipath
For more information, see the man pages for
fuser(1)
and
lsof(8)
.
Clean-up PSM Shared Memory Files
In some cases if a PSM job terminates abnormally, such as with a segmentation
fault, there could be POSIX shared memory files leftover in the /dev/shm directory.
The file is owned by the user and in permission
-rwx------
, it can be removed
either by the user or by root.
PSM relies on the MPI implementation to cleanup after abnormal job termination.
In cases where this does not occur there may be leftover share memory files. To
clean up the system, create, save, and run the following PSM SHM cleanup script
as root on each node. Either logon to the node, or run remote using pdsh/ssh.
NOTE
Hard and explicit program termination, such as
kill -9
on the mpirun
Process ID (PID), may result in Open MPI being unable to guarantee that
the
/dev/shm
shared memory file is properly removed. As many stale files
accumulate on each node, an error message can appear at startup:
node023:6.Error creating shared memory object in
shm_open(/dev/shm may have stale shm files that need
to be removed):
If this occurs, refer to
Clean-up PSM Shared Memory Files
for information.
Summary of Contents for OFED+ Host
Page 1: ...IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 14: ...xiv IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 22: ...1 Introduction Interoperability 1 4 IB0054606 02 A...
Page 96: ...4 Running MPI on QLogic Adapters Debugging MPI Programs 4 24 IB0054606 02 A...
Page 140: ...6 SHMEM Description and Configuration SHMEM Benchmark Programs 6 32 IB0054606 02 A...
Page 148: ...8 Dispersive Routing 8 4 IB0054606 02 A...
Page 164: ...9 gPXE HTTP Boot Setup 9 16 IB0054606 02 A...
Page 176: ...A Benchmark Programs Benchmark 3 Messaging Rate Microbenchmarks A 12 IB0054606 02 A...
Page 202: ...B SRP Configuration OFED SRP Configuration B 26 IB0054606 02 A Notes...
Page 206: ...C Integration with a Batch Queuing System Clean up PSM Shared Memory Files C 4 IB0054606 02 A...
Page 238: ...E ULP Troubleshooting Troubleshooting SRP Issues E 20 IB0054606 02 A...
Page 242: ...F Write Combining Verify Write Combining is Working F 4 IB0054606 02 A Notes...
Page 280: ...G Commands and Files Summary of Configuration Files G 38 IB0054606 02 A...
Page 283: ......