NVIDIA Multi-Instance GPU

Seven Independent Instances in a Single GPU

Multi-Instance GPU (MIG) expands the performance and value of each NVIDIA A100 Tensor Core GPU. MIG can partition the A100 GPU into as many as seven instances, each fully isolated with their own high-bandwidth memory, cache, and compute cores. Now administrators can support every  workload, from the smallest to the largest, offering a right-sized GPU with guaranteed quality of service (QoS) for every job, optimizing utilization and extending the reach of accelerated computing resources to every user.

Benefits Overview

Expand GPU Access to More Users

Expand GPU Access to More Users

With MIG, you can achieve up to 7X more GPU resources on a single A100 GPU. MIG gives researchers and developers more resources and flexibility than ever before.

Optimize GPU Utilization

Optimize GPU Utilization

MIG provides the flexibility to choose many different instance sizes, which allows provisioning of right-sized GPU instance for each workload, ultimately delivering optimal utilization and maximizing data center investment.

Run Simultaneous Mixed Workloads

Run Simultaneous Mixed Workloads

MIG enables inference, training, and high-performance computing (HPC) workloads to run at the same time on a single GPU with deterministic latency and throughput.

How the Technology Works

Without MIG, different jobs running on the same GPU, such as different AI inference requests, compete for the same resources like memory bandwidth. A job consuming larger memory bandwidth starves others, resulting in several jobs missing their latency targets. With MIG, jobs run simultaneously on different instances, each with dedicated resources for compute, memory, and memory bandwidth, resulting in predictable performance with quality of service and maximum GPU utilization.


Dramatic Gains in Performance and Utilization with Multi-Instance GPUs

Achieve Ultimate Data Center Flexibility

An NVIDIA A100 GPU can be partitioned into different-sized MIG instances. For example, an administrator could create two instances with 20 gigabytes (GB) of memory each or three instances with 10 GB or seven instances with 5 GB. Or a mix of them. So Sysadmin can provide right-sized GPUs to users for different types of workloads.

MIG instances can also be dynamically reconfigured, enabling administrators to shift GPU resources in response to changing user and business demands. For example, seven MIG instances can be used during the day for low-throughput inference and reconfigured to one large MIG instance at night for deep learning training.

Deliver Exceptional Quality of Service

Each MIG instance has a dedicated set of hardware resources for compute, memory, and cache, delivering guaranteed quality of service (QoS) and fault isolation for the workload. That means that  failure in an application running on one instance doesn’t impact applications running on other instances. And different instances can run different types of workloads—interactive model development, deep learning training, AI inference, or HPC applications. Since the instances run in parallel, the workloads also run in parallel—but separate and isolated—on the same physical A100 GPU.

MIG is a great fit for workloads such as AI model development and low-latency inference. These workloads can take full advantage of A100’s features and fit into each instance’s allocated memory.

Watch MIG in Action

NVIDIA A100 Tensor Core GPU

Running Multiple Workloads on a Single A100 GPU

This demo runs AI and high-performance computing (HPC) workloads simultaneously on the same A100 GPU.

Multi-Instance GPU on the NVIDIA A100 Tensor Core GPU

Boosting Performance and Utilization with Multi-Instance GPU

This demo shows inference performance on a single slice of MIG and then scales linearly across the entire A100.

Built for IT and DevOps

MIG enables fine-grained GPU provisioning by IT and DevOps teams. Each MIG instance behaves like a standalone GPU to applications, so there is no change to the CUDA® platform. MIG can be used in all the major enterprise computing environments​.

Deep dive into the NVIDIA Ampere Architecture.