Cloud Services

DeepL Deploys Real-Time, Multilingual Language AI Translation Powered by NVIDIA AI Infrastructure

Objective

DeepL, a global AI product and research company focused on building secure, intelligent solutions to complex business problems, has achieved unprecedented speed and accuracy in AI translation by deploying context-dependent AI translation in milliseconds with NVIDIA and EcoDataCenter. The company's AI infrastructure, named “Arion”—powered by an NVIDIA DGX SuperPOD with DGX GB200 systems—enables millions of daily users across dozens of languages to receive accurate translations with near-zero latency.

Customer

DeepL

Partner

EcoDataCenter

Use Case

Customized Inference

Key Takeaways

10x Faster Translation Processing

  • Arion can translate the entire Internet in approximately 18 days compared to 193 days with the previous system, turning hypothetical capabilities into practical possibilities.

Near Real-Time AI Inference at Scale

  • The infrastructure enables DeepL to handle millions of daily users with low latency while maintaining the accuracy DeepL is known for.

Translation Performance Improvements

  • Oxford English Dictionary, which took 39 seconds previously, now takes just 3.72 second—a 10X speedup

Scaling AI Translation to Meet Global Demand

DeepL built its reputation on delivering translations that are both extremely accurate and naturally fluent, capturing nuance and context in ways that feel native to each language. This requires sophisticated AI models that understand not just language depth, but the world around us, since translation is inherently context-dependent.

The challenge facing DeepL was scaling its AI capabilities to meet growing global demand while maintaining the quality and speed users expect. With millions of users visiting daily and relying on instant, accurate translations, the company needed infrastructure that could handle massive computational requirements for both training larger, more capable models and delivering real-time inference.

Traditional approaches to scaling AI infrastructure often create trade-offs between model size, training speed, and inference latency. DeepL needed a solution that would eliminate these constraints and enable them to pursue ambitious new use cases like voice-to-voice translation, which demands intelligent real-time prediction and translation as people speak.

DeepL

Deploying Next-Generation Supercomputing Architecture

DeepL became the first organization in Europe to deploy AI infrastructure, named “Arion”, powered by an NVIDIA DGX SuperPOD with DGX GB200 systems. Built on the NVIDIA Blackwell architecture, this AI infrastructure connects 72 Blackwell GPUs to function as a single unit—a ninefold increase from their previous system's eight-GPU configuration.

The deployment went far beyond simply installing new hardware. DeepL optimized its entire approach to training and running large language models (LLMs) to extract maximum performance from the increased compute power. The company implemented NVIDIA TensorRT LLM to create efficiency in the inference process, reducing latency for nearly 200,000+ businesses worldwide without sacrificing quality and accuracy.

DeepL's engineering team embraced floating-point optimization, moving from 16-bit to 8-bit floating-point training and inference using NVIDIA's Transformer Engine and TensorRT-LLM. This approach optimized inference down to the kernel level—reducing latency, boosting throughput, and cutting costs per query.

The collaboration extended to infrastructure planning with EcoDataCenter, which anticipated that next-generation machines would require liquid cooling. This foresight enabled DeepL to deploy the DGX SuperPOD using green power in one of the world's most advanced and sustainable data centers.

DeepL also developed sophisticated synthetic data generation techniques that boost data volume by a factor of 1,000. This approach extrapolates from expert linguistic insight to create high-quality training data at scale, putting human linguistic instincts and expertise at the heart of their model training.

"DeepL is leveraging NVIDIA GB200 hardware to train Mixture-of-Experts (MoE) models, advancing its model architecture to improve efficiency during training and inference, setting new benchmarks for performance in AI."

Paul Busch
Team Lead Research - Foundation Model Training, DeepL

DeepL

Achieving Breakthrough Speed and Enabling Innovation

The performance improvements are substantial across all metrics. Translating the Oxford English Dictionary, which took 39 seconds with DeepL’s previous “Mercury” cluster, now takes just 3.72 seconds with Arion. With further optimization using NVFP4 inference, DeepL envisions reducing this to approximately 2 seconds while translating the worldwide web in just over two weeks.

Beyond raw speed, the infrastructure unlocks emergent capabilities that enable DeepL to pursue ambitious innovations. The company has launched DeepL Voice, a revolutionary approach to voice-to-voice translation that intelligently predicts what people are saying and translates as they speak. DeepL has also introduced Clarify, an on-demand translation expert that reaches out with intelligent questions to clarify meaning when detecting ambiguity.

The increased compute power, combined with NVIDIA's enabling of advanced floating-point formats, means DeepL's larger, more powerful models can still perform tasks extremely quickly. This eliminates the typical trade-off where larger AI models come with greater latency, enabling complex tasks at the speed users require.

“NVIDIA TensorRT-LLM is one of those technologies that we've been looking at very closely from the beginning on and it allows us to create more efficiency in the inference process in how we run our models. It allows us to reduce the latency for our customers without sacrificing any of the quality and accuracy that DeepL is known for.”

Jaroslaw Kutylowski
CEO and Founder, DeepL

Future of DeepL in AI Translation and Beyond

DeepL's collaboration with NVIDIA involves ongoing discussions about training and inference capabilities that LLMs require, as well as joint work on optimization algorithms for software running supercomputers. These software optimizations unlock major gains in energy efficiency and create additional capacity for innovation.

The infrastructure enables DeepL to test a far wider range of ideas for real-world impact, expanding the possibilities for how people experience work and communicate across borders. By building larger models and training them on larger amounts of high-quality synthetic data, DeepL can discover previously unpredicted emergent capabilities—moments where models evolve from finding tasks extremely difficult to finding them relatively easy.

DeepL's success demonstrates how strategic infrastructure investments combined with deep technical collaboration can transform what's possible in Language AI. The company continues to pursue innovations that help businesses communicate better both internally and externally, creating dialogue across borders and enabling more productive, more natural experiences of working with AI.

Deploy AI inference at scale with optimized performance and ROI on NVIDIA’s full-stack platform, including for frontier Mixture-of-Experts models.

Related Customer Stories