
Getting Started with DGX Station A100
DGX Station A100
DU-10189-001 _v5.0.2 | 15
The GPU specification is longer because of the nature of UUIDs, but this is the most precise
way to pin specific GPUs to the application.
2.3.3. Using Multi-Instance GPUs
Multi-Instance GPUs (MIG) is a technology that is available on NVIDIA A100 GPUs. If MIG is
enabled on the GPUs and if the GPUs have been partitioned already, then applications can be
limited to run on these devices.
This works for both Docker containers and for bare metal using the
CUDA_VISIBLE_DEVICES
as shown in the examples below. For instructions on how to configure and use MIG, refer to
the
NVIDIA Multi-Instance GPU User Guide
.
Identify the MIG instances that will be used. Here is the output from a system that has
GPU 0
partitioned into 7 MIGs:
lab@ro-dvt-058-80gb:~$
nvidia-smi -L
GPU 0: Graphics Device (UUID: GPU-269d95f8-328a-08a7-5985-ab09e6e2b751)
MIG 1g.10gb Device 0: (UUID: MIG-GPU-269d95f8-328a-08a7-5985-ab09e6e2b751/7/0)
MIG 1g.10gb Device 1: (UUID: MIG-GPU-269d95f8-328a-08a7-5985-ab09e6e2b751/8/0)
MIG 1g.10gb Device 2: (UUID: MIG-GPU-269d95f8-328a-08a7-5985-ab09e6e2b751/9/0)
MIG 1g.10gb Device 3: (UUID: MIG-GPU-269d95f8-328a-08a7-5985-ab09e6e2b751/11/0)
MIG 1g.10gb Device 4: (UUID: MIG-GPU-269d95f8-328a-08a7-5985-ab09e6e2b751/12/0)
MIG 1g.10gb Device 5: (UUID: MIG-GPU-269d95f8-328a-08a7-5985-ab09e6e2b751/13/0)
MIG 1g.10gb Device 6: (UUID: MIG-GPU-269d95f8-328a-08a7-5985-ab09e6e2b751/14/0)
GPU 1: Graphics Device (UUID: GPU-0f2dff15-7c85-4320-da52-d3d54755d182)
GPU 2: Graphics Device (UUID: GPU-dc598de6-dd4d-2f43-549f-f7b4847865a5)
GPU 3: DGX Display (UUID: GPU-91b9d8c8-e2b9-6264-99e0-b47351964c52)
GPU 4: Graphics Device (UUID: GPU-e32263f2-ae07-f1db-37dc-17d1169b09bf)
In Docker, enter the MIG UUID from this output, in which
GPU 0
and
Device 0
have been
selected.
If you are running on DGX Station A100, restart the
nv-docker-gpus
and docker system
services any time MIG instances are created, destroyed or modified by running the following:
lab@ro-dvt-058-80gb:~$
sudo systemctl restart nv-docker-gpus; sudo systemctl restart
docker
nv-docker-gpus
has to be restarted on DGX Station A100 because this service is used to
mask the available GPUs that can be used by Docker. When the GPU architecture changes, the
service needs to be refreshed.
lab@ro-dvt-058-80gb:~$
docker run --gpus '"device=MIG-GPU-269d95f8-328a-08a7-5985-
ab09e6e2b751/7/0"' --rm -it ubuntu nvidia-smi -L
GPU 0: Graphics Device (UUID: GPU-269d95f8-328a-08a7-5985-ab09e6e2b751)
MIG 1g.10gb Device 0: (UUID: MIG-GPU-269d95f8-328a-08a7-5985-ab09e6e2b751/7/0)
On bare metal, specify the MIG instances:
Remember:
This application measures the communication across GPUs, and it is not relevant
to read the bandwidth and latency with only one GPU MIG.
The purpose of this example is to illustrate how to use specific GPUs with applications, which is
clearly illustrated below.
lab@ro-dvt-058-80gb: /usr/local/cuda-11.2/samples/bin/x86_64/linux/release
$
CUDA_VISIBLE_DEVICES=MIG-GPU-269d95f8-328a-08a7-5985-ab09e6e2b751/7/0 ./
p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, Graphics Device MIG 1g.10gb, pciBusID: 1, pciDeviceID: 0, pciDomainID:0