NVIDIA L4 Tensor Core GPU

The breakthrough universal accelerator for efficient video, AI, and graphics.

Accelerate Video, AI, and Graphics Workloads

The NVIDIA L4 Tensor Core GPU powered by the NVIDIA Ada Lovelace architecture delivers universal, energy-efficient acceleration for video, AI, visual computing, graphics, virtualization, and more. Packaged in a low-profile form factor, L4 is a cost-effective, energy-efficient solution for high throughput and low latency in every server, from the edge to the data center to the cloud.

Up to 120X Higher AI Video Performance

L4 delivers up to 120X Higher AI Video Performance

Measured performance: 8x L4 vs 2S Intel 8362 CPU server comparison, end-to-end video pipeline with CV-CUDA® decode, preprocessing, inference (SegFormer), postprocessing, encode, NVIDIA® TensorRT™ 8.6 vs CPU-only pipeline using OpenCV 4.7, PyTorch inference.

Experience Real-Time AI Video Pipeline Performance

Transform video applications with the power of NVIDIA L4. Whether streaming live to millions of viewers, enabling users to build creative stories, or delivering immersive augmented and virtual reality (AR/VR) experiences, servers equipped with L4 allow hosting up to 1,040 concurrent AV1 video streams at 720p30 for mobile users.¹

With fourth-generation Tensor Cores and 1.5X larger GPU memory, NVIDIA L4 GPUs paired with the CV-CUDA® library take video content-understanding to a new level.  L4 delivers 120X higher AI video performance than CPU-based solutions, letting enterprises gain real-time insights to personalize content, improve search relevance, detect objectionable content, and implement smart-space solutions.

1. Measured performance: 8x L4 AV1 low-latency P1 preset encode at 720p30.

Consume Less Energy and Space With L4

As AI and video become more pervasive, the demand for efficient, cost effective computing is increasing more than ever. NVIDIA L4 Tensor Core GPUs deliver up to 120X better AI video performance, resulting in up to 99 percent better energy efficiency and lower total cost of ownership compared to traditional CPU-based infrastructure. This lets enterprises reduce rack space and significantly lower their carbon footprint, while being able to scale their data centers to many more users. The energy saved by switching from CPUs to NVIDIA L4s in a 2 megawatt (MW) data center can power nearly 2,000 homes for one year or match the carbon offset from 172,000 trees grown over 10 years.²

 

2. Results from EPA calculator using 1.677MW savings.

Better Energy Efficiency

L4 delivers up to 99% Better Energy Efficiency

8x L4 vs. 2S Intel 8362 CPU server TCO comparison: end-to-end video pipeline with CV-CUDA pre- and postprocessing, decode, inference (SegFormer), encode, TRT 8.6 vs. CPU-only pipeline using OpenCV 4.7, PyTorch inference.

Accelerate Generative AI Performance

2.5X More Generative AI Performance

L4 delivers 2.5X More Generative AI Performance

Measured performance: L4 vs T4 image generation, 512x512 stable diffusion v2.1, FP16, TensorRT 8.5.2.

Generative AI for images and text makes customer lives more convenient and experiences more immersive across all industries. NVIDIA L4 supercharges compute-intensive generative AI inference by delivering up to 2.5X higher performance compared to the previous GPU generation. And with 50 percent more memory capacity, L4 enables larger image generation, up to 1024x768, which wasn’t possible on the previous GPU generation.


Optimize Graphics Performance

Over 4X Higher Real-Time Rendering and Over 3X Higher Ray-Tracing Performance

L4 delivers over 4X Higher Rendering Performance for AI-based Avatars

Measured performance:
Real-time rendering: NVIDIA Omniverse™ performance for real-time rendering at 1080p and 4K with NVIDIA Deep Learning Super Sampling (DLSS) 3.
Ray tracing: Gaming performance geomean for AAA titles supporting ray tracing and DLSS 3.

With third-generation RT Cores and AI-powered NVIDIA Deep Learning Super Sampling 3 (DLSS 3), NVIDIA L4 delivers over 4X higher performance for AI-based avatars, NVIDIA Omniverse™ virtual worlds, cloud gaming, and virtual workstations. These capabilities enable creators to build real-time, cinematic-quality graphics and scenes for immersive visual experiences not possible with CPUs.

Accelerate Workloads Efficiently and Sustainably

NVIDIA L4 is an integral part of the NVIDIA data center platform. Built for video, AI, NVIDIA RTX™ virtual workstation (vWS), graphics, simulation, data science, and data analytics, the platform accelerates over 3,000 applications and is available everywhere at scale, from data center to edge to cloud, delivering both dramatic performance gains and energy-efficiency opportunities.

Optimized for mainstream deployments, L4 delivers a low-profile form factor operating in a 72W low-power envelope, making it an efficient, cost-effective solution for any server or cloud instance from NVIDIA’s partner ecosystem.

Streamline Development and Deployment With Enterprise-Ready AI Software

Optimized to streamline AI development and deployment, the NVIDIA AI Enterprise software suite includes AI solution workflows, frameworks, pretrained models, and infrastructure optimization that are certified to run on common data center platforms and mainstream NVIDIA-Certified Systems™ with NVIDIA L4 GPUs. 

NVIDIA AI Enterprise is a license addition for NVIDIA L4 GPUs, making AI accessible to nearly every organization with the highest performance in training, inference, and data science. NVIDIA AI Enterprise, together with NVIDIA L4, simplifies the building of an AI-ready platform, accelerates AI development and deployment, and delivers performance, security, and scalability to gather insights faster and achieve business value sooner.

See Who’s Using L4

Product Specifications

Form Factor L4
FP32 30.3 teraFLOPs
TF32 Tensor Core 120 teraFLOPS*
FP16 Tensor Core 242 teraFLOPS*
BFLOAT16 Tensor Core 242 teraFLOPS*
FP8 Tensor Core 485 teraFLOPs*
INT8 Tensor Core 485 TOPs*
GPU memory 24GB
GPU memory bandwidth 300GB/s
NVENC | NVDEC | JPEG decoders 2 | 4 | 4
Max thermal design power (TDP) 72W
Form factor 1-slot low-profile, PCIe
Interconnect PCIe Gen4 x16 64GB/s
Server options Partner and NVIDIA-Certified Systems with 1–8 GPUs

* Shown with sparsity. Specifications are one-half lower without sparsity.

Get started with L4 early access on Google Cloud.