Using NVIDIA® BlueField® data processing units (DPUs) to create a scalable, secure, and efficient AI cloud infrastructure to meet complex workload demands, cut costs, and outperform the capabilities of traditional hosting environments.
The AI revolution is reshaping the data center landscape and requires platforms that offer flexible and cost-effective compute and data capabilities. To meet these demands, CoreWeave set out to develop a cloud for accelerated computing workloads that could provide both high-performance, stringent tenant isolation and security in a multi-tenant environment.
Summary
To meet the demands of AI, data centers need platforms with flexible, high-performance, and cost-effective compute and data capabilities. CoreWeave set out to develop a cloud-scale, accelerated computing infrastructure that could provide high-performance, stringent tenant isolation and security in a multi-tenant environment.
The CoreWeave team knew that their infrastructure needed to support external networking for access and internal networking for compute. It had to manage network traffic across hundreds of thousands of NVIDIA GPUs, sustaining performance under heavy workloads. And it needed to offload and accelerate network and storage tasks to free up CPU resources—allowing processors to concentrate on compute-intensive workloads and improve storage access for more efficient AI computing. Another primary objective for CoreWeave was to develop an infrastructure that could meet the growing demands of AI applications, ensuring scalability to handle increasingly complex and compute-intensive, large-scale workloads well into the future. CoreWeave looked to harness the power of the NVIDIA BlueField networking platform and NVIDIA DOCA software framework to meet these demands.
Using components from the NVIDIA DOCA software framework, specifically the DOCA Host-Based Networking service (DOCA HBN), which is based on OVS-DOCA and DOCA FLOW, CoreWeave accelerated cloud networking services and APIs. DOCA HBN leverages the same core components as the NVIDIA Cumulus Linux network operating system, such as FRRouting and NVIDIA User Experience (NVUE), and packages them in a container running on BlueField DPUs. This setup enables CoreWeave to manage complex network functions in a scalable and distributed way, supporting tenant isolation, load balancing, and traffic steering within a multi-tenant environment. This ensures that each cloud tenant has access only to their own data and compute jobs and that these tasks are managed efficiently without compromising performance, allowing for a scalable and secure network. By offloading and accelerating these tasks, DOCA HBN reduces the load on the CPU, freeing it to handle compute-intensive processes and improving overall system performance.
With DOCA HBN, CoreWeave moved network isolation to the BlueField DPU and employed Ethernet VPN-Virtual Extensible LAN (EVPN VXLAN) to create distinct virtual networks for each tenant, processing and routing traffic via VXLAN network identifiers to ensure complete separation. This is vital in multi-tenant environments where strict network isolation is critical for security and compliance. CoreWeave also adopted a decentralized architecture by implementing an internet gateway on BlueField, using DOCA-accelerated Open Virtual Switch (OVS-DOCA) for traffic steering and network address translation (NAT). BlueField also plays a critical role in managing network traffic by hosting gateways and announcing Border Gateway Protocol (BGP) routes for efficient network management.
To build their next-generation AI storage service, CoreWeave AI Object Storage, CoreWeave partnered with VAST Data, leveraging their deep integration with NVIDIA technologies. VAST incorporates BlueField DPUs as control data nodes (DNodes), eliminating the need for traditional x86 CPUs by offloading data services directly to the DPU through NVIDIA DOCA APIs.
This architecture redefines how control and policy are enforced across the data layer—closer to where data flows and far more efficiently. The work with NVIDIA DPUs began with BlueField-1 and has now evolved to BlueField-3, with ongoing efforts to deploy BlueField in compute-adjacent nodes (CNodes). This enables infrastructure-grade multi-tenancy, where each CoreWeave AI Object Storage tenant operates within an isolated, secure network domain. Real-time telemetry and fine-grained policy enforcement are executed directly at the DPU layer. The platform's support for multi-protocol access further abstracts complexity and enables seamless interoperability across diverse compute and data environments.
This level of flexibility empowers CoreWeave to scale AI Object Storage without disrupting performance or compromising tenant isolation. Offloading control services to BlueField has fundamentally reshaped infrastructure economics—optimizing for performance, security, and scalability. Performance benchmarking shows the impact of this architectural evolution: Compared to x86-based DNodes with DRAM and traditional NVIDIA ConnectX® NICs, BlueField-powered DNodes deliver a 1.6x increase in sequential throughput, from 40 gigabytes per second (GB/s) to 64 GB/s, while cutting power consumption by 58 percent, from 1200 W down to 500 W. That translates to a 3.84x improvement in performance per watt. This design serves as a blueprint for modern AI infrastructure—demonstrating how hardware-software codesigned at the infrastructure layer can unlock massive gains in efficiency, scalability, and performance for cloud service providers like CoreWeave.
The new system also doubled the wire bandwidth by employing four BlueField DPUs with eight 100GbE ports. While some power savings were attributed to the adoption of larger, more efficient solid-state drives (SSDs), the overall efficiency gains primarily stemmed from the DPU integration. This test underscores the potential of BlueField DPUs to revolutionize data center architectures, offering a compelling solution for companies like CoreWeave seeking to enhance their AI infrastructure's performance and energy efficiency.
This visual comparison highlights the differences between running x86 CPUs versus NVIDIA BlueField DPUs.
In addition to leveraging BlueField DPUs for offloading, accelerating, and isolating workloads, CoreWeave connects their storage and management infrastructure with NVIDIA Spectrum Ethernet switches running NVIDIA Cumulus Linux. Cumulus Linux was built to streamline network management via software-driven automation. Combining a pure layer-3 EVPN VXLAN implementation with APIs from the NVUE object model, CoreWeave is able to manage automation, operation, and updates to the network with ease, even at massive scale.
At the same time that CoreWeave leverages BlueField DPUs and Spectrum Ethernet switches for their storage and management network, they use NVIDIA Quantum-2 InfiniBand networking for their GPU-to-GPU AI compute fabric. InfiniBand is the gold standard for AI networking, offering the highest effective bandwidth and ultra-low latency for AI training workloads at load and at scale. InfiniBand features—such as adaptive routing and telemetry-based congestion control—are essential for multi-tenant AI cloud environments, providing performance isolation and ensuring that all CoreWeave users receive the full networking bandwidth they need. CoreWeave also uses Quantum InfiniBand’s NVIDIA Scalable Hierarchical Aggregation Reduction Protocol (SHARP)™ for in-network computing of AI collective operations, offloading communication operations from the GPU, simplifying network traffic patterns, and accelerating job completion times.
CoreWeave's implementation of NVIDIA BlueField DPUs using NVIDIA DOCA has transformed their AI cloud infrastructure, delivering significant technical, operational, and business value. The integration of BlueField DPUs has enabled CoreWeave to create a highly efficient, scalable, and secure platform that meets the demanding requirements of modern AI workloads and cloud operations. CoreWeave's innovative approach, combining VAST Data's storage solutions with NVIDIA BlueField DPUs, has positioned them as a leading AI cloud provider. Their commitment to adopting innovative technologies ensures they can meet future customer needs and handle increasingly complex AI workloads.
Strategically implementing advanced technologies like NVIDIA BlueField DPUs and the NVIDIA DOCA software framework can revolutionize cloud infrastructure for AI applications. CoreWeave's success in building a scalable, efficient, and secure AI cloud platform with BlueField DPUs demonstrates the significant benefits in optimizing infrastructure for AI and high-performance computing.
Start your journey toward a more efficient, powerful, and cost-effective data center. Contact us today to learn how NVIDIA BlueField DPUs can transform your infrastructure and give you a competitive edge in the AI-driven world.