NVIDIA Hopper Architecture

The New Engine for the World’s AI Infrastructure Makes Order-of-Magnitude Performance Leap.

The Accelerated Computing Platform for
Next-Generation Workloads

Take the next massive leap in accelerated computing with the NVIDIA Hopper architecture. With the ability to securely scale diverse workloads in every data center—from small enterprise to exascale high-performance computing (HPC) and trillion-parameter AI—Hopper lets brilliant innovators fulfill their life's work at the fastest pace in human history.

Technology Breakthroughs

Built with over 80 billion transistors using a cutting edge TSMC 4N process, Hopper features five groundbreaking innovations that fuel the NVIDIA H100 Tensor Core GPU and combine to deliver an incredible 30X speedup over the prior generation on AI inference of NVIDIA’s Megatron 530B chatbot, the world’s largest generative language model.

Transformer Engine

Transformer Engine

The NVIDIA Hopper architecture advances Tensor Core technology with the Transformer Engine, designed to accelerate the training of AI models. Hopper Tensor Cores have the capability to apply mixed FP8 and FP16 precisions to dramatically accelerate AI calculations for transformers. Hopper also triples the floating-point operations per second (FLOPS) for TF32, FP64, FP16, and INT8 precisions over the prior generation. Combined with Transformer Engine and fourth-generation NVIDIA® NVLink®, Hopper Tensor Cores power an order-of-magnitude speedup on HPC and AI workloads.

NVLink Switch System

To move at the speed of business, exascale HPC and trillion-parameter AI models need high-speed, seamless communication between every GPU in a server cluster to accelerate at scale.

The fourth generation NVLink is a scale-up interconnect. When combined with the new external NVLink Switch, the NVLink Switch System now enables scaling multi-GPU IO across multiple servers at 900 gigabytes/second (GB/s) bi-directional per GPU, over 7X the bandwidth of PCIe Gen5. NVLink Switch System supports clusters of up to 256 connected H100s and delivers 9X higher bandwidth than InfiniBand HDR on Ampere.

In addition, NVLink now supports in-network computing called SHARP, previously only available on Infiniband, and can deliver an incredible one exaFLOP of FP8 sparsity AI compute while delivering 57.6 terabytes/s (TB/s) of All2All bandwidth.

NVLink Switch System
NVIDIA Confidential Computing

NVIDIA Confidential Computing

While data is encrypted at rest in storage and in transit across the network, it’s unprotected while it’s being processed. NVIDIA Confidential Computing addresses this gap by protecting data and applications in use. The NVIDIA Hopper architecture introduces the world’s first accelerated computing platform with confidential computing capabilities.

With strong hardware-based security, users can run applications on-premises, in the cloud, or at the edge and be confident that unauthorized entities can’t view or modify the application code and data when it’s in use. This protects confidentiality and integrity of data and applications while accessing the unprecedented acceleration of H100 GPUs for AI training, AI inference, and HPC workloads.

Second-Generation MIG

With Multi-Instance GPU (MIG), a GPU can be partitioned into several smaller, fully isolated instances with their own memory, cache, and compute cores. The Hopper architecture further enhances MIG by supporting multi-tenant, multi-user configurations in virtualized environments across up to seven GPU instances, securely isolating each instance with confidential computing at the hardware and hypervisor level. Dedicated video decoders for each MIG instance deliver secure, high-throughput intelligent video analytics (IVA) on shared infrastructure. And with Hopper’s concurrent MIG profiling, administrators can monitor right-sized GPU acceleration and optimize resource allocation for users.

For researchers with smaller workloads, rather than renting a full CSP instance, they can elect to use MIG to securely isolate a portion of a GPU while being assured that their data is secure at rest, in transit, and at compute.

Second-Generation MIG
DPX Instructions

DPX Instructions

Dynamic programming is an algorithmic technique for solving a complex recursive problem by breaking it down into simpler subproblems. By storing the results of subproblems so that you don’t have to recompute them later, it reduces the time and complexity of exponential problem solving. Dynamic programming is commonly used in a broad range of use cases. For example, Floyd-Warshall is a route optimization algorithm that can be used to map the shortest routes for shipping and delivery fleets. The Smith-Waterman algorithm is used for DNA sequence alignment and protein folding applications.

Hopper’s DPX instructions accelerate dynamic programming algorithms by 40X compared to traditional dual-socket CPU-only servers and by 7X compared to NVIDIA Ampere architecture GPUs. This leads to dramatically faster times in disease diagnosis, routing optimizations, and even graph analytics.

Preliminary specifications, may be subject to change

Take a Deep Dive into the NVIDIA Hopper Architecture