This site requires Javascript in order to view all its content. Please enable Javascript in order to access all the functionality of this web site. Here are the instructions how to enable JavaScript in your web browser.

Cloud-Native Supercomputing

Secure, Multi-Tenant, Bare-Metal

Performance for AI, Data Analytics,
HPC Applications.

Starter Kit

Bare-Metal Performance with
Multi-Tenant Isolation

Cloud-native supercomputing blends the power of high performance computing with the security and ease of use of cloud computing services.The NVIDIA Cloud-Native Supercomputing platform leverages the NVIDIA^® BlueField^® data processing unit (DPU) architecture with high-speed, low-latency NVIDIA Quantum InfiniBand networking to deliver bare-metal performance, user management and isolation, data protection, and on-demand high-performance computing (HPC) and AI services—simply and securely.

Download Technical Overview

Innovation for the Next Decade and Beyond

The Cloud-Native
Supercomputing Platform

To deliver maximum performance, supercomputers need to offer multi-tenancy security—which is ideally achieved through cloud-native platforms. The key element that enables this architecture transition is the DPU.

As a fully integrated data center-on-a-chip platform, the DPU can offload and manage data center infrastructure instead of the host processor, enabling security and orchestration of the supercomputer.

Combined with NVIDIA Quantum InfiniBand switching, this architecture delivers optimal bare-metal performance, while natively supporting multi-node tenant isolation.

Toward a Zero-Trust Architecture

Cloud-native supercomputing systems are designed to deliver maximum performance, security, and orchestration in a multi-tenant environment.

The BlueField DPU can host untrusted multi-node tenants while ensuring that supercomputing resources are provided clean to new tenants without prior residuals. To achieve this, the BlueField DPU provides a clean boot image for a newly scheduled tenant, performs a complete cleanup and re-establishment of trust, virtualizes storage, and grants access to approved storage areas.

Application Performance Acceleration

HPC and AI communication frameworks and libraries are latency and bandwidth sensitive, and they play a critical role in determining application performance.

Offloading the libraries from the host CPU or GPU to the Bluefield DPU creates the highest degree of overlap for parallel progression of communication and computation. It also reduces the negative effects of operating system jitter and dramatically increases application performance. This is key to enabling the next generation of supercomputing architecture.

Early research results from the Ohio State University demonstrate that cloud-native supercomputers can perform HPC jobs 1.3x faster than traditional ones.

DPU Provides 1.3X Higher Performance Acceleration for P3DFFT

¹The performance tests were conducted by Ohio State University on the HPC-AI Advisory Council’s Cluster Center, with the following system configuration: 32 servers with dual-socket Intel Xeon 16-core CPUs E5-2697A V4 @ 2.60GHz (total of 32 processors per node), 256GB DDR4 2400MHz RDIMMs memory, and 1TB 7.2K RPM SATA 2.5" hard drive per node. The servers were connected with NVIDIA BlueField-2 InfiniBand HDR100 DPUs and NVIDIA Quantum QM7800 40-port HDR 200Gb/s InfiniBand switch.

NVIDIA Cloud-Native Supercomputer Delivers Bare-Metal Performance

Performance Isolation

The NVIDIA Quantum-2 InfiniBand platform provides innovative proactive monitoring and congestion management to deliver traffic isolations, nearly eliminating performance jitter, and ensuring predictive performance as if the application is being run on a dedicated system.

Cloud-Native Supercomputing Platform

NVIDIA Bluefield

The NVIDIA BlueField DPU combines the industry-leading NVIDIA ConnectX^® network adapter, an array of Arm cores with a PCIe subsystem, and purpose-built HPC hardware acceleration engines to deliver full data center infrastructure-on-chip programmability.

InfiniBand

NVIDIA Quantum InfiniBand networking accelerates and offloads data transfers to ensure compute resources never “go hungry” due to lack of data or bandwidth. The NVIDIA Quantum InfiniBand network can be partitioned between different users or tenants, providing security and quality of service (QoS) guarantees.

DOCA

The NVIDIA DOCA SDK enables infrastructure developers to rapidly create network, storage, security, management, and AI and HPC applications and services on top of the NVIDIA BlueField DPU, leveraging industry-standard APIs. With DOCA, developers can program the supercomputing infrastructure of tomorrow by creating high-performance, software-defined, and cloud-native DPU-accelerated services.

Magnum IO

The NVIDIA MAGNUM IO™ software development kit (SDK) enables developers to optimize the input/output (IO) in applications, reducing the end-to-end time of their workflows.

Magnum IO covers all aspects of IO, including storage, networking, multi-GPU, and multi-node communications. It also includes tools to profile and tune applications and eliminate IO bottlenecks.

Key Features

Multi-tenant isolation, data protection, and security
Infrastructure service offloads
Dedicated hardware engines for accelerating communication frameworks
Enhanced quality of service (QoS)

Benefits

Delivers optimal bare-metal performance
Increases CPU availability, application scalability, and system efficiency
Higher compute and communication overlap
Reduced jitter / system noise
Reduced infrastructure costs

Learn more about cloud-native supercomputing in the technical overview.

DOWNLOAD NOW