
●
With
mrun
, the exact need must be met. If the user asks for 8 nodes, all CPU and memory on 8 nodes must
be free for Marathon to accept the offer on behalf of
mrun
.
●
The Marathon API does not offer a way to ask if the needs of a job can be fully satisfied before a request can
be submitted. Therefore, Mesos is queried for its resource availability.
●
Users request for resources from Mesos to give to YARN via Cray developed scripts for starting
NodeManagers. The request is submitted to Marathon. This is called flex up. Once users get the requested
resources, they can run their Hadoop jobs/ Hive queries / Oozie work-flows. Once they complete this, they
release the resources back to Mesos via the Cray flex scripts. Flex scripts require the exact number of nodes
to address requests and cannot run with fewer resources. When the number of resources requested in the
flex up request does not match the current number of resources that are available with Mesos, an error
message is displayed indicating that the number of resources available is less than the number of requested
resources and that the user can submit a new flex up request.
●
If the system is loaded and other frameworks (e.g. Spark) keep submitting smaller jobs, flex scripts may keep
exiting if they do not receive the required number of nodes. This could lead to starvation of Hadoop jobs.
5.2
Use Apache Mesos on Urika-GX
Apache
™
Mesos
™
acts as the primary resource manager on the Urika-GX platform. It is a cluster manager that
provides efficient resource isolation and sharing across distributed applications and/or frameworks. It lies between
the application layer and the operating system and simplifies the process of managing applications in large-scale
cluster environments, while optimizing resource utilization.
Architecture
Major components of a Mesos cluster include:
●
Mesos agents/slaves - Agents/slaves are the worker instances of Mesos that denote resources of the
cluster.
●
Mesos masters - The master manages agent/slave daemons running on each cluster node and implements
fine-grained sharing across frameworks using resource offers. Each resource offer is a list of free resources
on multiple agents/slaves. The master decides how many resources to offer to each framework according to
an organizational policy, such as fair sharing or priority.
By default, Urika-GX ships with three Mesos masters with a quorum size of two. At least two Mesos masters
must be running at any given time to ensure that the Mesos cluster is functioning properly. Administrators can
use the
urika-state
and
urika-inventory
commands to check the status of Mesos masters and
slaves. Administrators can also check the status of Mesos by pointing their browser at
http:
hostname
-
login1:5050
and ensuring that it is up. In addition, executing the
ps -ef | grep mesos
command on
the login nodes displays the running Mesos processes.
Components that Interact with Mesos
●
Frameworks - Frameworks run tasks on agents/slaves. The Mesos Master offers resources to frameworks
that are registered with it. Frameworks decide either to accept or reject the offer. If a framework accepts the
offer, Mesos offers the resources and the framework scheduler then schedules the respective tasks on
resources. Each framework running on Mesos consists of two components:
Resource Management
S3016
126