NVIDIA Virtual Compute Server

Power the Most Compute-Intensive Server Workloads with Virtual GPUs

Virtualize Compute for AI, Deep Learning, and Data Science

NVIDIA Virtual Compute Server (vCS) enables data centers to accelerate server virtualization with the latest NVIDIA data center GPUs, including NVIDIA A100 Tensor Core GPU, so that the most compute-intensive workloads, such as artificial intelligence, deep learning, and data science, can be run in a virtual machine (VM).

Features

GPU Sharing

GPU Sharing

GPU sharing (fractional) is only possible with NVIDIA vGPU technology. It enables multiple VMs to share a GPU, maximizing utilization for lighter workloads that require GPU acceleration.

GPU Aggregation

GPU Aggregation

With GPU aggregation, a VM can access more than one GPU, which is often required for compute-intensive workloads. vCS supports both multi-vGPU and peer-to-peer computing. With multi-vGPU, the GPUs aren’t directly connected; with peer-to-peer, they are through NVLink for higher bandwidth.

Management and Monitoring

Management and Monitoring

vCS provides support for app-, guest-, and host-level monitoring. In addition, proactive management features provide the ability to do live migration, suspend and resume, and create thresholds that expose consumption trends impacting user experiences, all through the vGPU management SDK.

NGC

NGC

NVIDIA GPU Cloud (NGC) is a hub for GPU-optimized software that simplifies workflows for deep learning, machine learning, and HPC, and now supports virtualized environments with NVIDIA vCS.

Peer-to-Peer Computing

Peer-to-Peer Computing

NVIDIA® NVLink is a high-speed, direct GPU-to-GPU interconnect that provides higher bandwidth, more links, and improved scalability for multi-GPU system configurations—now supported virtually with NVIDIA virtual GPU (vGPU) technology.

ECC & Page Retirement

ECC & Page Retirement

Error correction code (ECC) and page retirement provide higher reliability for compute applications that are sensitive to data corruption. They’re especially important in large-scale cluster-computing environments where GPUs process very large datasets and/or run applications for extended periods.

Multi-Instance GPU (MIG)

Multi-Instance GPU (MIG)

Multi-Instance GPU (MIG) is a revolutionary technology that enables each NVIDIA A100 Tensor Core GPU to be partitioned into up to seven instances, fully isolated and secured at the hardware level with their own high-bandwidth memory, cache, and compute cores. With vCS software, a VM can be run on each of these MIG instances, enabling management, monitoring, and operational benefits of hypervisor-based server virtualization.

GPUDirect

GPUDirect

GPUDirect® RDMA (remote direct memory access) enables network devices to directly access GPU memory, bypassing CPU host memory, decreasing GPU-to-GPU communication latency, and completely offloading the CPU.

GPU Recommendations

  NVIDIA A100¹ NVIDIA V100S NVIDIA A401 NVIDIA RTX 8000 NVIDIA RTX 6000 NVIDIA T4
Memory 40 GB HBM2 32 GB HBM2 48 GB GDDR6 48 GB GDDR6 24 GB GDDR6 16 GB GDDR6
Peak FP 32 19.5 TFLOPS 16.4 TFLOPS 38.1 TFLOPS 14.9 TFLOPS 14.9 TFLOPS 8.1 TFLOPS
Peak FP 64 9.7 TFLOPS 8.2 TFLOPS - - - -
NVLink: Number of GPUs per VM Up to 4 Up to 8 2 2 2 -
ECC and Page Retirement
Multi-GPU per VM Up to 16 Up to 16 Up to 16 Up to 16 Up to 16 Up to 16

Virtualization Partners

Frequently Asked Questions

Learn More About NVIDIA Virtual GPU Software

View product release notes and supported third-party software products.