GPFS checks
Performance problems:
Refer to GPFS Problem Determination and GPFS
Performance Whitepapers.
GPFS file system failure:
Refer to GPFS problem determination.
SNMP monitoring
The service processor network, Ethernet switches, and Myrinet switch can be
monitored using SNMP. All devices should be configured to send their SNMP
traps to the management server. The management server should be configured to
use trapd so that SNMP traps can be translated to a human readable form and
added to
syslog
.
Use the lsnode -Al command to determine the hostname for the Falcon card and
the service processor name associated with the node of interest. Then use telnet or
web browser to connect using the hostname for the Falcon card and select options
to configure SNMP.
Setting up SNMP alerts from Myrinet
The Myrinet 2000 network in Linux Cluster 1350 is installed with monitoring
cards. One can use graphical monitoring program mute to monitor the whole
network for bad events, all of which are logged and reported by the monitoring
cards. You can use an SNMP client or a web browser to access monitoring card
information. You can even have monitoring cards notify you of bad events by
email.
The following Myrinet software packages are required:
v
GM software. This is the base software required to use Myrinet 2000 network. It
is the message-passing system for Myrinet networks, and includes a driver,
Myrinet-interface control program, a network mapping program, and the GM
API, library, and header files (current version is 1.4; version 1.5 is expected
soon.).
v
m3-dist package. Provides the source for building the SNMP library for the GM
layer.
v
mute (GUI) tool to monitor the Myrinet network (the name will likely change in
the not too distant future).
Order in which the software should be built:
v
GM including the mt tools.
v
m3-dist (has dependency on GM)
v
mute (has dependency on GM and m3-dist)
The README-Linux and mt/README that ships with the GM software, the
README that ships with the m3-dist software, and the README that ships with
the mute software provide comprehensive details on how to build the respective
parts.
Currently m3-dist and mute compile against GM 1.5. With GM 1.4 the SNMP
library does not build (m3-dist) and building mute isn’t straight forward either. So
we recommend building the above software against GM 1.5. (Note that GM 1.5 is
not generally available yet but is expected to be released soon.)
All of the above Myrinet software can be obtained from:
http://www.myri.com/scs/index.html (for GM, select the
LANai9
software).
Chapter 10. Hardware/software problem determination
71
Summary of Contents for System Cluster 1350
Page 1: ...eServer Cluster 1350 Cluster 1350 Installation and Service IBM...
Page 2: ......
Page 3: ...eServer Cluster 1350 Cluster 1350 Installation and Service IBM...
Page 8: ...vi Installation and Service...
Page 10: ...viii Installation and Service...
Page 12: ...x Installation and Service...
Page 20: ...2 Installation and Service...
Page 30: ...12 Installation and Service...
Page 32: ...14 Installation and Service...
Page 52: ...34 Installation and Service...
Page 68: ...50 Installation and Service...
Page 70: ...52 Installation and Service...
Page 72: ...54 Installation and Service...
Page 74: ...56 Installation and Service...
Page 92: ...74 Installation and Service...
Page 96: ...78 Installation and Service...
Page 98: ...80 Installation and Service...
Page 104: ...86 Installation and Service...
Page 110: ...92 Installation and Service...
Page 124: ...106 Installation and Service...
Page 126: ...108 Installation and Service...
Page 138: ...120 Installation and Service...
Page 139: ...Part 4 Appendixes Copyright IBM Corp 2003 121...
Page 140: ...122 Installation and Service...
Page 144: ...126 Installation and Service...
Page 148: ...130 Installation and Service...
Page 154: ...136 Installation and Service...
Page 160: ...142 Installation and Service...
Page 169: ......
Page 170: ...IBMR Printed in U S A...