![Cray ClusterStor L300 Installation Manual Download Page 74](http://html1.mh-extra.com/html/cray/clusterstor-l300/clusterstor-l300_installation-manual_2680121074.webp)
13.4.2 Troubleshoot the Lustre Validation Tests
This section presents possible causes and resolutions for errors that may occur during the Lustre validation tests
for ClusterStor L300 and L300N systems.
IMPORTANT: Carefully inspect any errors generated by CSM. In many cases the errors reveal the exact
problem through backend script output, middleware issues, etc.
The Lustre Validation Hangs for Greater than 30 Minutes While Applying Network Settings
This problem can have many possible causes, because many backend tools and scripts are involved. To
investigate:
●
Run
ps
and determine the process ID (
PID
) for the
beSystemNetConfig.sh
script from the second column
of of the output:
[MGMT0]$
ps aux | grep beSystem
●
Check the status of the
beSystemNetConfig.sh
script:
[MGMT0]$
watch pstree -la PID
[MGMT0]$
watch pstree -la 65518
Every 2.0s: pstree -la 65518
Wed Jan 22 14:38:45 2014
sudo /opt/xyratex/bin/beRunSanity --fsname sn11000 --mpoint/tmp/tmphNDsuO --
mgsnid lsn11022n002@o2ib0 --request-id 256
`-beRunSanity /opt/xyratex/bin/beRunSanity --fsname sn11000 --mpoint/tmp/
tmphNDsuO --mgsnid lsn11022n002@o2ib0 --request-id 256
`-python2.6 -m t0.backend.fs_tests.beRunSanity --fsname
sn11000 --mpoint /tmp/tmphNDsuO --mgsnid lsn11000n002@o2ib0-- reque
`-sh -cPATH=$PATH:/usr/sbin
\040RUNAS_ID=301\040CLIENTONLY=1\040ACC_SM_ONLY="RUNTESTS\040SANITY"\040\
ONLY="1\0402\0403\0404\0405\0406\0407\0408\0409\04010\04011\04012\04013\04014\04
015\04016
`-sanity.sh /usr/lib64/lustre/tests/sanity.sh
|-sanity.sh /usr/lib64/lustre/tests/sanity.sh
| |-grep -v grep
| |-grep -q multiop
| `-ps auxww
`-tee -i
/var/lib/xyratex/logs/lustre_validation/test_logs/1390422238/
sanity.test_161.test_log.sn11000n000.log
Observe whether the output changes at all. If the script hangs on the same step for more than 2-3 minutes,
several options are possible:
○
If the script hangs on OST mount, check the network by pinging all IP addresses assigned to the cluster
from all hosts (e.g., make certain that the OSS node can be pinged from the MGS IP and vice-versa).
○
If it is a network issue, make certain all cabling and switches are installed correctly.
○
If the network looks fine, SSH to the node hosting the OST and run:
[admin@nxxx]$
dmesg
The
dmesg
command lists any kernel errors. If no errors are related to the OST mount, proceed to the
next option.
○
Examine the
/proc/mdstat
file:
Perform the First-Run Configuration
74