
8
–
Dispersive Routing
8-2
IB0054606-02 A
Internally, PSM utilizes dispersive routing differently for small and large
messages. Large messages are any messages greater-than or equal-to 64K. For
large messages, the message is split into message fragments of 128K by default
(called a window). Each of these message windows is sprayed across a distinct
path between ports. All packets belonging to a window utilize the same path
however the windows themselves can take a different path through the fabric.
PSM assembles the windows that make up an MPI message before delivering it to
the application. This allows limited out of order semantics through the fabrics to be
maintain with little overhead. Small messages on the other hand always utilize a
single path when communicating to a remote node however different processes
executing on a node can utilize different paths for their communication between
the nodes. For example, two nodes A and B each with 8 processors per node.
Assuming the fabric is configured for a LMC of 3, PSM constructs 8 paths through
the fabric as described above and a 16 process MPI application that spans these
nodes (8 process per node). Then:
Each MPI process is automatically bound to a given CPU core numbered
between 0-7. PSM does this at startup to get improved cache hit rates and
other benefits.
Small Messages sent from a process on core N will use path N.
For a large message, each process will utilize all of the 8 paths and spray
the windowed messages across it.
The above highlights the default path selection policy that is active in PSM when
running on non-zero LMC configured fabrics. There are 3 other path selection
policies that determine how to select the path (or path index from the set of
available paths) used by a process when communicating with a remote node. The
above path policy is called
adaptive
. The 3 remaining path policies are static
policies that assign a static path on job startup for both small and large message
transfers.
Static_Src
: Only
one path per process
is used for all remote
communications. The path index is based on the CPU number the process
is running.
NOTE
Only path N will be used by this process for all communications to any
process on the remote node.
NOTE
Multiple paths are still used in the fabric if multiple processes (each on
a different CPU) are communicating.
Summary of Contents for OFED+ Host
Page 1: ...IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 14: ...xiv IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 22: ...1 Introduction Interoperability 1 4 IB0054606 02 A...
Page 96: ...4 Running MPI on QLogic Adapters Debugging MPI Programs 4 24 IB0054606 02 A...
Page 140: ...6 SHMEM Description and Configuration SHMEM Benchmark Programs 6 32 IB0054606 02 A...
Page 148: ...8 Dispersive Routing 8 4 IB0054606 02 A...
Page 164: ...9 gPXE HTTP Boot Setup 9 16 IB0054606 02 A...
Page 176: ...A Benchmark Programs Benchmark 3 Messaging Rate Microbenchmarks A 12 IB0054606 02 A...
Page 202: ...B SRP Configuration OFED SRP Configuration B 26 IB0054606 02 A Notes...
Page 206: ...C Integration with a Batch Queuing System Clean up PSM Shared Memory Files C 4 IB0054606 02 A...
Page 238: ...E ULP Troubleshooting Troubleshooting SRP Issues E 20 IB0054606 02 A...
Page 242: ...F Write Combining Verify Write Combining is Working F 4 IB0054606 02 A Notes...
Page 280: ...G Commands and Files Summary of Configuration Files G 38 IB0054606 02 A...
Page 283: ......