DGX A100 System
DU-09821-001_v06
| 74
Chapter 12.
Multi-Instance GPU
Multi-Instance GPU (MIG) is a new capability of the NVIDIA A100 GPU. MIG uses spatial
partitioning to carve the physical resources of a single A100 GPU into as many as seven
independent GPU instances. These instances run simultaneously, each with its own memory,
cache, and compute streaming multiprocessors. MIG enables the A100 GPU to deliver
guaranteed quality of service at up to 7X higher utilization compared to non-MIG enabled
GPUs.
MIG enables:
GPU memory isolation among parallel GPU workloads
Physical allocation of resources used by parallel GPU workloads
Management of MIG instances is accomplished using the NVIDIA Management Library (NVML)
APIs or its command-line utility (nvidia-smi). Enablement of MIG requires a GPU reset and
hence some system services that manage GPUs should be terminated before enabling MIG.
To enable MIG on all eight GPUs in the system, issue the following.
1.
Stop the NVSM and DCGM services.
$ sudo systemctl stop nvsm dcgm
2.
Enable MIG on all eight GPUs.
$ sudo nvidia-smi -mig 1
If other services are running that prevent you from resetting the GPUs, then reboot the
system and skip the next step.
3.
Restart the DCGM and NVSM services.
$ sudo systemctl start dcgm nvsm
which provides more detailed information about key MIG
concepts and deployment considerations and explains how to create MIG instances and how to
run Docker containers using MIG.
Содержание DGX A100
Страница 1: ...DU 09821 001_v06 May 2022 DGX A100 System User Guide ...
Страница 74: ...Using the BMC DGX A100 System DU 09821 001_v06 69 7 Select Server CA Configuration 8 Select Enroll Cert ...
Страница 76: ...Using the BMC DGX A100 System DU 09821 001_v06 71 ...
Страница 107: ...Redfish APIs Support DGX A100 System DU 09821 001_v06 102 Korea RoHS Material Content Declaration ...
Страница 108: ...Redfish APIs Support DGX A100 System DU 09821 001_v06 103 ...