
Enable MIG Mode in DGX Station A100
DGX Station A100
DU-10189-001 _v5.0.2 | 28
$
sudo nvidia-smi -i 0 -mig 1
$
sudo nvidia-smi --gpu-reset
Resetting GPU 00000000:00:03.0 is not supported.
‣
If you have agents on the system, such as monitoring agents that use the GPU, you
might not be able to initiate a GPU reset.
On DGX systems, for example, you might encounter the following message:
$
sudo nvidia-smi -i 0 -mig 1
Warning: MIG mode is in pending enable state for GPU 00000000:07:00.0:In use
by another client
00000000:07:00.0 is currently being used by one or more other processes (e.g.
CUDA application or a monitoring application such as another instance of
nvidia-smi). Please first kill all processes using the device and retry the
command or reboot the system to make MIG mode effective.
All done.
4. Stop the
nvsm
,
dcgm
, and
gdm3
services, enable MIG mode on the desired GPU, and restore
the monitoring services:
$
sudo systemctl stop nvsm
$
sudo systemctl stop dcgm
$
sudo systemctl stop gdm3
$
sudo nvidia-smi -i 0 -mig 1
Enabled MIG Mode for GPU 00000000:07:00.0
All done.
The examples use super-user privileges. When you grant read access to
mig/config
capabilities, non-root users can also manage instances after the Station A100 has been
configured in MIG mode. Refer to
Here are the default file permissions on the
mig/config
file:
$
ls -l /proc/driver/nvidia/capabilities/*
/proc/driver/nvidia/capabilities/mig:
total 0
-r-------- 1 root root 0 May 24 16:10 config
-r--r--r-- 1 root root 0 May 24 16:10 monitor
To ensure that the MIG instances are available in your containers, restart
nv-docker-gpus
and
docker
:
$
sudo systemctl restart nv-docker-gpus
$
sudo systemctl restart docker