
3
–
InfiniBand
®
Cluster Setup and Administration
Performance Settings and Management Tips
3-30
IB0054606-02 A
High Risk Tuning for Intel Harpertown CPUs
For tuning the Harpertown generation of Intel Xeon CPUs that entails a higher risk
factor, but includes a bandwidth benefit, the following can be applied:
For nodes with Intel Harpertown, Xeon 54xx CPUs, you can add
pcie_caps=0x51
and
pcie_coalesce=1
to the
modprobe.conf
file. For
example:
options ib_qib pcie_caps=0x51 pcie_coalesce=1
If the following problem is reported by syslog, a typical diagnostic can be
performed, which is described in the following paragraphs:
[PCIe Poisoned TLP][Send DMA memory read]
Another potential issue is that after starting
openibd
, messages such as the
following appear on the console:
Message from syslogd@st2019 at Nov 14 16:55:02 ...
kernel:Uhhuh. NMI received for unknown reason 3d on CPU 0
After this happens, you may also see the following message in the syslog:
Mth dd hh:mm:ss st2019 kernel: ib_qib 0000:0a:00.0:
infinipath0:
Fatal Hardware Error, no longer usable, SN AIB1013A43727
These problems typically occur on the first run of an MPI program running over
the PSM transport or immediately after the link becomes active. The adapter will
be unusable after this situation until the system is rebooted. To resolve this issue
try the following solutions in order:
Remove
pcie_coalesce=1
Restart
openibd
and try the MPI program again
Remove both
pcie_caps=0x51
and
pcie_coalesce=1
options from the
ib_qib
line in
modprobe.conf
file and reboot the system
NOTE
Removing both options will technically avoid the problem but can result
in an unnecessary performance decrease. If the system has already
failed with the above diagnostic it will need to be rebooted. Note that in
modprobe.conf file all options for a particular kernel module must be on
the same line and not on repeated options ib_qib lines.
Summary of Contents for OFED+ Host
Page 1: ...IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 14: ...xiv IB0054606 02 A OFED Host Software Release 1 5 4 User Guide...
Page 22: ...1 Introduction Interoperability 1 4 IB0054606 02 A...
Page 96: ...4 Running MPI on QLogic Adapters Debugging MPI Programs 4 24 IB0054606 02 A...
Page 140: ...6 SHMEM Description and Configuration SHMEM Benchmark Programs 6 32 IB0054606 02 A...
Page 148: ...8 Dispersive Routing 8 4 IB0054606 02 A...
Page 164: ...9 gPXE HTTP Boot Setup 9 16 IB0054606 02 A...
Page 176: ...A Benchmark Programs Benchmark 3 Messaging Rate Microbenchmarks A 12 IB0054606 02 A...
Page 202: ...B SRP Configuration OFED SRP Configuration B 26 IB0054606 02 A Notes...
Page 206: ...C Integration with a Batch Queuing System Clean up PSM Shared Memory Files C 4 IB0054606 02 A...
Page 238: ...E ULP Troubleshooting Troubleshooting SRP Issues E 20 IB0054606 02 A...
Page 242: ...F Write Combining Verify Write Combining is Working F 4 IB0054606 02 A Notes...
Page 280: ...G Commands and Files Summary of Configuration Files G 38 IB0054606 02 A...
Page 283: ......