pshpstuningguidewp040105.doc
Page
7
2.1.5 MP_TASK_AFFINITY
Setting MP_TASK_AFFINITY to
SNI
tells parallel operating environment (POE) to bind each
task to the MCM containing the HPS adapter it will use, so that the adapter, CPU, and memory
used by any task are all local to the same MCM. To prevent multiple tasks from sharing the same
CPU, do not set MP_TASK_AFFINITY to
SNI
if more than four tasks share any HPS adapter.
If more than four tasks share any HPS adapter, set MP_TASK_AFFINITY to
MCM,
which allows
each MPI task to use CPUs and memory from the same MCM, even if the adapter is on a remote
MCM. If MP_TASK_AFFINITY is set to either
MCM
or
SNI,
MEMORY_AFFINITY should be
set to
MCM
.
2.1.6 MP_CSS_INTERRUPT
The MP_CSS_INTERRUPT variable allows you to control interrupts triggered by packet arrivals.
Setting this variable to
no
implies that the application should run in polling mode. This setting is
appropriate for applications that have mostly synchronous communication. Even applications that
make heavy use of MPI_ISEND/MPI_IRECV should be considered synchronous unless there is
significant computation between the ISEND/IRECV postings and the MPI_WAITALL. The
default value for MP_CSS_INTERRUPT is
no.
For applications with an asynchronous communication pattern (one that uses non-blocking MPI
calls), it might be more appropriate to set this variable to
yes
. Setting MP_CSS_INTERRUPT to
yes
can cause your application to be interrupted when new packets arrive, which could be
helpful if a receiving MPI task is likely to be in the middle of a long numerical computation at the
time when data from a remote-blocking send arrives.
2.2 MPI-IO
The most effective use of MPI-IO is when an application takes advantage of file views and
collective operations to read or write a file in which data for each task is dispersed across the file.
To simplify we focus on read, but write is similar.
An example is reading a matrix with application-wide scope from a single file, with each task
needing a different fragment of that matrix. To bring in the fragment needed for each task,
several disjoint chunks must be read. If every task were to do POSIX read of each chunk, the
GPFS file system handle it correctly. However, because each read() is independent, there is little
chance to apply an effective strategy.
When the same set of reads is done with collective MPI-IO, every task specifies all the chunks it
needs to one MPI-IO call. Because the call is collective, the requirements of all the tasks are
known at one time. As a result, MPI can use a broad strategy for doing the I/O.
When MPI-IO is used but each call to read or write a file is local or specifies only a single chunk
of data, there is much less chance for MPI-IO to do anything more than a simple POSIX read()
would do. Also, when the file is organized by task rather than globally, there is less MPI-IO can
do to help. This is the case when each task's fragment of the matrix is stored contiguously in the
file rather than having the matrix organized as a whole.