
Under the Urika-GX multi-tenancy infrastructure, users can interact only with services specific to their tenant, and
are unaware of the existence of other tenants. Each tenant can see their per-tenant data in HDFS and Lustre and
are restricted from accessing data belonging to other tenants. An additional NameNode is created for each tenant
upon tenant creation.
In order to maximize isolation and minimize overhead, the tenant NameNodes run inside Docker containers,
which are orchestrated by Kubernetes, whereas the DataNodes run on the physical hosts. The Urika-GX tenant
management scripts are responsible for managing the life-cycle of the NameNode container. Likewise, the Urika-
GX tenant proxy is responsible for automatically directing and restricting tenant user
hdfs
commands to their
respective NameNode.
In Urika-GX's multi-tenant set up, each tenant has a dedicated NameNode, which includes the HDFS
configuration specific to that tenant NameNode. For a restricted user in secure mode, the Urika-GX tenant proxy
is responsible for injecting the correct configuration parameters. Tenants can only interact with their own
designated NameNode. Moreover, only the users that belong to a specific tenant can communicate with their
respective NameNode. HDFS DataNode serves multiple NameNodes.
Each tenant is assigned its own HDFS configuration directory. Individual tenant members are restricted from
overriding the Hadoop configuration directory and from specifying a specific NameNode on the CLI. As such,
certain arguments passed to HDFS commands on the CLI are ignored to ensure security of tenant data. If these
arguments are passed to the CLI, the system will return a warning indicating that it detected an argument that is
not allowed for restricted users and that the argument is being removed.
When the cluster is switched to the secure service mode, Kerberos and HDFS Service Level Authorization (SLA)
are used for authentication and authorization respectively. SLA restricts communication with a tenant NameNode
to users that belong to that tenant. Kerberos is used to authenticate with the NameNode when the system is
functioning in the secure service mode. HDFS commands from the VM do not require any additional arguments to
specify the tenant NameNode IP addresses. Moreover, Kubernetes is used to start and monitor component
containers. Other functionality, such as DNS and persistent volumes is also provided by Kubernetes.
NOTE: Tenant NameNode configuration is managed automatically by the Urika-GX tenant management
scripts. Manually altering the configurations of the tenant NameNode is not supported.
Persistent Storage
By default, data inside of container is ephemeral. While much of the tenant's NameNode can be stateless and
thus be unaffected by the ephemeral nature of containers, there are a few exceptions:
●
NameNode data directory - Stores the name space directory tree and responsible for all of the file indexing.
The loss of this data would result in a loss of all tenant data stored by the NameNode.
●
HDFS configuration files - Store the specific configuration for a tenant.
●
Kerberos keytabs - Contain Kerberos credentials for use of HDFS secure mode.
File System Permissions
The directory layout for persistent volumes is shown below:
●
/global/tenants/
tenant_name
/hdfs/conf mounted as /etc/hadoop/conf
in the tenant
container.
●
/security/
tenant_name
/hdfs/service_keytabs/
mounted as
/etc/security/keytabs
in the
tenant container.
●
/mnt/hdd-2/hdfs/nn-
tenant_name
/
mounted as
/mnt/hdd-2/hdfs/nn
in the tenant container.
Security
S3016
204