Shaping the next generation of AI.
Overview
The NVIDIA Rubin platform is built for the age of agentic AI and reasoning, engineered to master multi-step problem-solving and massive long-context workflows at scale. By eliminating critical bottlenecks in communication and memory movement, the Rubin Platform supercharges inference—delivering more tokens per watt and lowering cost per token versus the NVIDIA Blackwell generation.
The Rubin platform features a new Transformer Engine with hardware-accelerated adaptive compression to boost NVFP4 performance while preserving accuracy, enabling up to 50 petaFLOPS of NVFP4 inference. Fully compatible with NVIDIA Blackwell, the Transformer Engine ensures seamless upgrades, so previously optimized codes transition effortlessly to the Rubin platform.
The third-generation of NVIDIA Confidential Computing expands security to full-rack scale with NVIDIA Vera Rubin NVL72. This platform creates a unified trusted execution environment across all 36 NVIDIA Vera CPUs, 72 NVIDIA Rubin GPUs, and the NVIDIA NVLink™ fabric that seamlessly connects them. The platform maintains data security across CPU, GPU, and NVLink domains. With attestation services for cryptographic proof of compliance, it combines massive scale with uncompromised protection, all to protect the world’s largest proprietary models, training data, and inference workloads.
The sixth-generation NVLink delivers a major leap for NVIDIA's high-speed GPU interconnect fabric that unifies 72 NVIDIA Rubin GPUs into a single performance domain. Doubling NVIDIA Blackwell’s performance, Rubin delivers 3.6 terabytes per second (TB/s) of bandwidth per GPU and 260 TB/s of connectivity with low latency to facilitate faster communication. Combined with NVIDIA® Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™ that reduces network congestion by up to 50 percent for collective operations, this next-generation interconnect accelerates training and inference for the world’s largest models, at scale and without compromise.
NVIDIA Rubin platform delivers rack-scale resiliency with advanced reliability features. NVIDIA Rubin GPUs feature a dedicated second-generation RAS engine for proactive maintenance and real-time health checks without downtime, while NVIDIA Vera CPUs add enhanced serviceability with SOCAMM LPDDR5X and in-system tests for the CPU cores. The rack introduces modular, cable-free tray designs for 18x faster assembly and serviceability versus NVIDIA Blackwell, combined with intelligent resiliency and software-defined NVLink routing, which ensures continuous operation and reduces maintenance overhead.
The NVIDIA Vera CPU is engineered for data movement and agentic reasoning across accelerated systems, with full confidential computing support. It pairs seamlessly with NVIDIA GPUs or operates independently for analytics, cloud, orchestration, storage, and high-performance computing (HPC) workloads. Vera combines 88 NVIDIA-designed cores, up to 1.2 TB/s of LPDDR5X memory bandwidth, and NVIDIA Scalable Coherency Fabric to deliver predictable, energy-efficient performance for data- and memory-intensive workloads with full Arm® compatability. Integrated NVLink-C2C connectivity enables high-bandwidth, coherent CPU–GPU memory access to maximize system utilization and efficiency.
Read this technical deep dive to learn how NVIDIA Vera Rubin treats the data center, not the chip, as the unit of compute, establishing a new foundation for producing intelligence efficiently, securely, and predictably at scale.