NVIDIA Multi-Instance GPU

Seven Independent Instances in a Single GPU

Multi-Instance GPU (MIG) expands the performance and value of each NVIDIA A100 Tensor Core GPU. MIG can partition the A100 GPU into as many as seven instances, each fully isolated with their own high-bandwidth memory, cache, and compute cores. Now administrators can support every  workload, from the smallest to the largest, offering a right-sized GPU with guaranteed quality of service (QoS) for every job, optimizing utilization and extending the reach of accelerated computing resources to every user.

Benefits Overview

Expand GPU Access to More Users

Expand GPU Access to More Users

With MIG, you can achieve up to 7X more GPU resources on a single A100 GPU. MIG gives researchers and developers more resources and flexibility than ever before.

Optimize GPU Utilization

Optimize GPU Utilization

MIG provides the flexibility to choose many different instance sizes, which allows provisioning of right-sized GPU instance for each workload, ultimately delivering optimal utilization and maximizing data center investment.

Run Simultaneous Mixed Workloads

Run Simultaneous Mixed Workloads

MIG enables inference, training, and high-performance computing (HPC) workloads to run at the same time on a single GPU with deterministic latency and throughput.

How the Technology Works

Without MIG, different jobs running on the same GPU, such as different AI inference requests, compete for the same resources like memory bandwidth. A job consuming larger memory bandwidth starves others, resulting in several jobs missing their latency targets. With MIG, jobs run simultaneously on different instances, each with dedicated resources for compute, memory, and memory bandwidth, resulting in predictable performance with quality of service and maximum GPU utilization.

Multi Instance GPU

Achieve Ultimate Data Center Flexibility

An NVIDIA A100 GPU can be partitioned into different-sized MIG instances. For example, an administrator could create two instances with 20 gigabytes (GB) of memory each or three instances with 10 GB or seven instances with 5 GB. Or a mix of them. So Sysadmin can provide right-sized GPUs to users for different types of workloads.

MIG instances can also be dynamically reconfigured, enabling administrators to shift GPU resources in response to changing user and business demands. For example, seven MIG instances can be used during the day for low-throughput inference and reconfigured to one large MIG instance at night for deep learning training.

Deliver Exceptional Quality of Service

Each MIG instance has a dedicated set of hardware resources for compute, memory, and cache, delivering guaranteed quality of service (QoS) and fault isolation for the workload. That means that  failure in an application running on one instance doesn’t impact applications running on other instances. And different instances can run different types of workloads—interactive model development, deep learning training, AI inference, or HPC applications. Since the instances run in parallel, the workloads also run in parallel—but separate and isolated—on the same physical A100 GPU.

MIG is a great fit for workloads such as AI model development and low-latency inference. These workloads can take full advantage of A100’s features and fit into each instance’s allocated memory.

Built for IT and DevOps

MIG is built for ease of deployment by IT and DevOps teams.

Each MIG instance behaves like a standalone GPU to applications, so there is no change to the CUDA® platform. AI models and HPC containerized applications, such as those from NGC, can run directly on a MIG instance with the NVIDIA Container Runtime. MIG instances present as additional GPU resources in container orchestrators like Kubernetes, which can schedule containerized workloads to run within specific GPU instances. This feature will be available soon via the NVIDIA device plugin for Kubernetes

Organizations can take advantage of the management, monitoring, and operational benefits of hypervisor-based server virtualization, including live migration and multi-tenancy, on MIG GPU instances with NVIDIA Virtual Compute Server (vCS).

Deep dive into the NVIDIA Ampere Architecture.