pshpstuningguidewp040105.doc
Page
6
thread, and from within the MPI/LAPI polling code that is invoked when the application makes
blocking MPI calls.
MP_POLLING_INTERVAL specifies the number of microseconds an MPI/LAPI service thread
should wait (sleep) before it checks whether any data previously sent by the MPI task needs to be
retransmitted. MP_RETRANSMIT_INTERVAL specifies the number of passes through the
internal MPI/LAPI polling routine between calls before checking whether any data needs to be
resent. When the switch fabric, adapters, and nodes are operating properly, data that is sent
arrives intact, and the receiver sends the source task an acknowledgment for the data. If the
sending task does not receive such an acknowledgment within a reasonable amount of time
(determined by the variable MP_RETRANSMIT_INTERVAL), it assumes the data has been lost
and tries to resend it.
Sometimes when many MPI tasks share the switch adapters, switch fabric, or both, the time it
takes to send a message and receive an acknowledgment is longer than the library expects. In this
case, data might be retransmitted unnecessarily. Increasing the values of
MP_POLLING_INTERVAL and MP_RETRANSMIT_INTERVAL decrease the likelihood of
unnecessary retransmission but increase the time a job is delayed when a packet is actually
dropped.
2.1.3 MP_REXMIT_BUF_SIZE and MP_REXMIT_BUF_CNT
You can improve application performance by allowing a task that is sending a message shorter
than the “eager” limit to return the send buffer to the application before the message has reached
its destination, rather than forcing the sending task to wait until the data has actually reached the
receiving task and the acknowledgement has been returned. To allow immediate return of the
send buffer to the application, LAPI attempts to make a copy of the data in case it must be
retransmitted later (unlikely but not impossible). LAPI copies the data into a retransmit buffer
(REXMIT_BUF) if one is available. The MP_REXMIT_BUF_SIZE and
MP_REXMIT_BUF_CNT environment variables control the size and number of the retransmit
buffers allocated by each task.
2.1.4 MEMORY_AFFINITY
The POWER4™ and ™ models of the pSeries 690 have more than one multi-chip
module (MCM). An MCM contains eight CPUs and frequently has two local memory cards. On
these systems, application performance can improve when each CPU and the memory it accesses
are on the same MCM.
Setting the AIX MEMORY_AFFINITY environment variable to
MCM
tells the operating system
to attempt to allocate the memory from within the MCM containing the processor that made the
request. If memory is available on the MCM containing the CPU, the request is usually granted.
If memory is not available on the local MCM, but is available on a remote MCM, the memory is
taken from the remote MCM. (Lack of local memory does not cause the job to fail.)