GPFS checks
Performance problems:
Refer to GPFS Problem Determination and GPFS
Performance Whitepapers.
GPFS file system failure:
Refer to GPFS problem determination.
SNMP monitoring
The service processor network, Ethernet switches, and Myrinet switch can be
monitored using SNMP. All devices should be configured to send their SNMP
traps to the management server. The management server should be configured to
use trapd so that SNMP traps can be translated to a human readable form and
added to
syslog
.
Use the lsnode -Al command to determine the hostname for the Falcon card and
the service processor name associated with the node of interest. Then use telnet or
web browser to connect using the hostname for the Falcon card and select options
to configure SNMP.
Setting up SNMP alerts from Myrinet
The Myrinet 2000 network in Linux Cluster 1350 is installed with monitoring
cards. One can use graphical monitoring program mute to monitor the whole
network for bad events, all of which are logged and reported by the monitoring
cards. You can use an SNMP client or a web browser to access monitoring card
information. You can even have monitoring cards notify you of bad events by
email.
The following Myrinet software packages are required:
v
GM software. This is the base software required to use Myrinet 2000 network. It
is the message-passing system for Myrinet networks, and includes a driver,
Myrinet-interface control program, a network mapping program, and the GM
API, library, and header files (current version is 1.4; version 1.5 is expected
soon.).
v
m3-dist package. Provides the source for building the SNMP library for the GM
layer.
v
mute (GUI) tool to monitor the Myrinet network (the name will likely change in
the not too distant future).
Order in which the software should be built:
v
GM including the mt tools.
v
m3-dist (has dependency on GM)
v
mute (has dependency on GM and m3-dist)
The README-Linux and mt/README that ships with the GM software, the
README that ships with the m3-dist software, and the README that ships with
the mute software provide comprehensive details on how to build the respective
parts.
Currently m3-dist and mute compile against GM 1.5. With GM 1.4 the SNMP
library does not build (m3-dist) and building mute isn’t straight forward either. So
we recommend building the above software against GM 1.5. (Note that GM 1.5 is
not generally available yet but is expected to be released soon.)
All of the above Myrinet software can be obtained from:
http://www.myri.com/scs/index.html (for GM, select the
LANai9
software).
Chapter 10. Hardware/software problem determination
71
Содержание System Cluster 1350
Страница 1: ...eServer Cluster 1350 Cluster 1350 Installation and Service IBM...
Страница 2: ......
Страница 3: ...eServer Cluster 1350 Cluster 1350 Installation and Service IBM...
Страница 8: ...vi Installation and Service...
Страница 10: ...viii Installation and Service...
Страница 12: ...x Installation and Service...
Страница 19: ...Part 1 Introduction to Cluster 1350 Chapter 1 System overview 3 Related Topics 9 Copyright IBM Corp 2003 1...
Страница 20: ...2 Installation and Service...
Страница 30: ...12 Installation and Service...
Страница 32: ...14 Installation and Service...
Страница 52: ...34 Installation and Service...
Страница 68: ...50 Installation and Service...
Страница 70: ...52 Installation and Service...
Страница 72: ...54 Installation and Service...
Страница 74: ...56 Installation and Service...
Страница 92: ...74 Installation and Service...
Страница 96: ...78 Installation and Service...
Страница 98: ...80 Installation and Service...
Страница 104: ...86 Installation and Service...
Страница 110: ...92 Installation and Service...
Страница 124: ...106 Installation and Service...
Страница 126: ...108 Installation and Service...
Страница 138: ...120 Installation and Service...
Страница 139: ...Part 4 Appendixes Copyright IBM Corp 2003 121...
Страница 140: ...122 Installation and Service...
Страница 144: ...126 Installation and Service...
Страница 148: ...130 Installation and Service...
Страница 154: ...136 Installation and Service...
Страница 160: ...142 Installation and Service...
Страница 169: ......
Страница 170: ...IBMR Printed in U S A...