6
–
SHMEM Description and Configuration
Sizing Global Shared Memory
IB0054606-02 A
6-9
The
salloc
allocates 16 nodes and runs one copy of
shmemrun
on the first
allocated node which then creates the SHMEM processes.
shmemrun
invokes
mpirun
, and
mpirun
determines the correct set of hosts and required number of
processes based on the
slurm
allocation that it is running inside of. Since
shmemrun
is used in this approach there is no need for the user to set up the
environment.
No Integration
This approach allows a job to be launched inside a
slurm
allocation but with no
integration. This approach can be used for any supported MPI implementation.
However, it requires that a wrapper script is used to generate the hosts file.
slurm
is used to allocate nodes for the job, and the job runs within that allocation but not
under the control of the
slurm
daemon. One way to use this approach is:
salloc -N 16
shmemrun_wrapper
shmem-test-world
Where
shmemrun_wrapper
is a user-provided wrapper script that creates a
hosts file based on the current
slurm
allocation and simply invokes
mpirun
with
the hosts file and other appropriate options. Note that ssh/rsh will be used for
starting processes not
slurm
.
Sizing Global Shared Memory
SHMEM provides
shmalloc
,
shrealloc
and
shfree
calls to allocate and
release memory using a symmetric heap. These functions are called collectively
across the processing elements (PEs) so that the memory is managed
symmetrically across them. The extent of the symmetric heap determines the
amount of global shared memory per PE that is available to the application.
This is an important resource and this section discusses the mechanisms
available to size it. Applications can access this memory in various ways and this
maps into quite different access mechanisms:
Accessing global shared memory on my PE: This is achieved by direct loads
and stores to the memory.
Accessing global shared memory on a PE on the same host: This is
achieved by mapping the global shared memory using the local shared
memory mechanisms (for example, System V shared memory) operating
system and then accessing the memory by direct loads and stores. This
means that each PE on a host needs to map the global shared memory of
each other PE on that host. These accesses do not use the adapter and
interconnect.
Accessing global shared memory on a PE on a different host: This is
achieved by sending put, get, and atomic requests across the interconnect.
Summary of Contents for OFED+ Host
Page 1: ...IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 14: ...xiv IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 22: ...1 Introduction Interoperability 1 4 IB0054606 02 A...
Page 96: ...4 Running MPI on QLogic Adapters Debugging MPI Programs 4 24 IB0054606 02 A...
Page 140: ...6 SHMEM Description and Configuration SHMEM Benchmark Programs 6 32 IB0054606 02 A...
Page 148: ...8 Dispersive Routing 8 4 IB0054606 02 A...
Page 164: ...9 gPXE HTTP Boot Setup 9 16 IB0054606 02 A...
Page 176: ...A Benchmark Programs Benchmark 3 Messaging Rate Microbenchmarks A 12 IB0054606 02 A...
Page 202: ...B SRP Configuration OFED SRP Configuration B 26 IB0054606 02 A Notes...
Page 206: ...C Integration with a Batch Queuing System Clean up PSM Shared Memory Files C 4 IB0054606 02 A...
Page 238: ...E ULP Troubleshooting Troubleshooting SRP Issues E 20 IB0054606 02 A...
Page 242: ...F Write Combining Verify Write Combining is Working F 4 IB0054606 02 A Notes...
Page 280: ...G Commands and Files Summary of Configuration Files G 38 IB0054606 02 A...
Page 283: ......