data:image/s3,"s3://crabby-images/d746e/d746eefcb3afed41a8f3877fed0a61a070406985" alt="Cray Urika-GX Administration Manual Download Page 259"
the metrics displayed by Mesos UI and the metrics that the curl calls return different results Mesos may not work
correctly and all the Mesos frameworks will be affected. As such, the aforementioned Cray-developed scripts and
mrun
will not be able to retrieve the needed resources. This behavior can be identified when:
●
there is a disconnect between the CURL calls and the Mesos UI. Specifically, there will be an indication of
orphaned Mesos tasks if the CURL call returns a higher number of CPUs used than that returned by the UI.
Cray-developed scripts for flexing YARN sub-clusters use curl calls, and hence do not allow flexing up if there
are not enough resources reported.
●
there are orphaned Mesos tasks, as indicated in the Mesos Master and Mesos Slave logs
at
/var/log/mesos
. Mesos Master will reject task status updates because it will not recognize the
framework those tasks are being sent from.
If this behavior is encountered, follow the instructions listed in this procedure:
Procedure
1. Log on to the System Management Workstation (SMW) as root
2. Clear the slave meta data on all the nodes with Mesos slave processes running
The following example can be used on a 3 sub-rack system:
#
pdsh -w nid000[00-47] -x nid000[00,16,30,31,32,46,47] \
'rm -vf /var/log/mesos/agent/meta/slaves/latest'
3. Stop the cluster
#
urika-stop
4. Start the cluster
#
urika-start
After following the aforementioned steps, the system should be restored to its original state. For additional
information, contact Cray Support.
8.8
Troubleshoot Common Analytic and System Management Issues
The following table contains a list of some common error messages and their description. Please note that this is
not an exhaustive list. Online documentation and logs should be referenced for additional debugging/
troubleshooting. For a list of Cray Graph Engine error messages and troubleshooting information, please refer to
the Cray
®
Graph Engine User Guide.
Table 38. System Management Error Messages
Error Message
Description
Notes/Resolution
ERROR: unauthorized command
'cat' requested by client
This message is returned when a
restricted user, logged in to a tenant
VM, attempts to execute a
command that is not part of set of
the white listed commands.
Only white listed commands can be
executed by restricted users who
are logged into tenant VMs. For
more information, refer to the
Troubleshooting
S3016
259