Each MIG instance has a dedicated set of hardware resources for compute, memory, and cache, delivering guaranteed quality of service (QoS) and fault isolation for the workload. That means that failure in an application running on one instance doesn’t impact applications running on other instances. And different instances can run different types of workloads—interactive model development, deep learning training, AI inference, or HPC applications. Since the instances run in parallel, the workloads also run in parallel—but separate and isolated—on the same physical A100 GPU.
MIG is a great fit for workloads such as AI model development and low-latency inference. These workloads can take full advantage of A100’s features and fit into each instance’s allocated memory.