pshpstuningguidewp040105.doc
Page
5
2.0 Tunables and settings for switch software
To optimize the HPS, you can set shell variables for Parallel Environment MPI-based workloads
and for IP-based workloads. This section reviews the shell variables that are most often used for
performance tuning. For a complete list of tunables and their usage, see the documentation listed
in section 7 of this paper.
2.1 MPI tunables for Parallel Environment
The following sections list the most common MPI tunables for applications that use the HPS.
Along with each tunable is a description of the variable, what it is used for, and how to set it
appropriately.
2.1.1 MP_EAGER_LIMIT
The MP_EAGER_LIMIT variable tells the MPI transport protocol to use the "eager" mode for
messages less than or equal to the specified size. Under the "eager" mode, the sender sends the
message without knowing if the matching receive has actually been posted by the destination
task. For messages larger than the EAGER_LIMIT, a rendezvous must be used to confirm that
the matching receive has been posted
The sending task does not have to wait for an okay from the receiver before sending the data, so
the effective start-up cost for a small message is lower in “eager” mode. As a result, any
messages that are smaller than the EAGER_LIMIT are typically faster, especially if the
corresponding receive has already been posted. If the receive has not been posted, the transport
incurs an extra copy cost on the target, because data is staged through the early-arrival buffers.
However, the overall time to send a small message might still be less in "eager" mode. Well-
designed MPI applications often try to post each MPI_RECV before the message is expected, but
because tasks of a parallel job are not in lock step, most applications have occasional early
arrivals.
The maximum message size for the “eager” protocol is currently 65536 bytes, although the
default value is lower. An application for which a significant fraction of the MPI messages are
less than 65536 bytes might see a performance benefit from setting MP_EAGER_LIMIT. If
MP_EAGER_LIMIT is increased above the default value, it might also be necessary to increase
MP_BUFFER_MEM, which determines the amount of memory available for early arrival
buffers. Higher “eager” limits or larger task counts either demand more buffer memory or reduce
the number of unlimited “eager” messages that can be outstanding, and therefore can also impact
performance.
2.1.2 MP_POLLING_INTERVAL and
MP_RETRANSMIT_INTERVAL
The MP_POLLING_INTERVAL and MP_RETRANSMIT_INTERVAL variables control how
often the protocol code checks whether data that was previously sent is assumed to be lost and
needs to be retransmitted. When the values are larger, this checking is done less often. There are
two different environment variables because the check can be done by an MPI/LAPI service