
35
NP
is the
total
number of processes in the MPI run. Assuming each node runs the same
number of processes, NP is computed as:
NP = (NC * LP)
RP
is defined as the number of
remote
processes relative to any given MPI process in an MPI
run. The remote process count is important because each MPI process connects to
every
remote process in the run using an RDMA Queue Pair. And each RDMA Queue Pair
consumes memory.
Assuming each node runs the same number of processes, RP is computed as:
RP = (NC * LP) – LP.
For example:
NC
LP
NP
RP
16
1
16
15
32
2
64
62
64
3
192
189
The main data structure used to RDMA data between processes is called a
vbuf.
Any given
IO operation consumes some number of
vbufs
based on the size of the IO operation. Each
vbuf
can convey 8256 bytes of application data. The size of the
vbuf
structure including
space for payload is 8360 bytes. Thus, the value 8360 is used below to compute memory
utilization.
4.7.1 VIADEV Environment Variables
NOTE:
Shell environment variables are used to allow tuning memory consumption by
the MPI RDMA driver. These variables must be set in the user account used to execute the
MPI run (EG: the
.bashrc
file). Further, the values must be identical on each node in the
cluster. Otherwise, the run will fail.
The following table lists these variables, their default values, and a short description.
Environment Variable Name
Default value
Description
VIADEV_NUM_RDMA_BUFFERS
NP <= 4, 1024
4 < NP <= 8, 512
8 < NP <= 16, 256
16 < NP <= 128, 128
128 < NP <= 256, 64
256 < NP <= 512, 32
512 <= NP, 24
The number of RDMA
Write buffers to use
per
RDMA connection.
VIADEV_RQ_DEPTH
NP <=64, 240
NP > 64, 120
The Receive Queue depth
for each RDMA Queue
Pair.
VIADEV_SQ_DEPTH
NP <= 64, 256
NP > 64, 128
The Send Queue depth for
each RDMA Queue Pair.