NVIDIA Tesla T4 Tensor Core GPU

Powering the TensorRT Hyperscale Inference Platform.

Next-Level Inference Acceleration Has Arrived

We’re racing toward the future where every customer interaction, every product, and every service offering will be touched and improved by AI. Realizing that the future requires a computing platform that can accelerate the full diversity of modern AI, enabling businesses to create new customer experiences, reimagine how they meet—and exceed—customer demands, and cost-effectively scale their AI-based products and services.

The NVIDIA® Tesla® T4 GPU is the world’s most advanced inference accelerator. Powered by NVIDIA Turing Tensor Cores, T4 brings revolutionary multi-precision inference performance to accelerate the diverse applications of modern AI. Packaged in an energy-efficient 75-watt, small PCIe form factor, T4 is optimized for scale-out servers and is purpose-built to deliver state-of-the-art inference in real time.

Breakthrough Inference Performance

Tesla T4 introduces the revolutionary Turing Tensor Core technology with multi-precision computing for AI inference. Powering breakthrough performance from FP32 to FP16 to INT8, as well as INT4 precisions, T4 delivers up to 40X higher performance than CPUs.

Breakthrough Inference Performance

STATE-OF-THE-ART INFERENCE IN REAL-TIME

Responsiveness is key to user engagement for services such as conversational AI, recommender systems, and visual search. As models increase in accuracy and complexity, delivering the right answer right now requires exponentially larger compute capability. Tesla T4 delivers up to 40X times better low-latency throughput, so more requests can be served in real time.

T4 INFERENCE PERFORMANCE

Resnet50

DeepSpeech2

GNMT

Video Transcoding Performance

As the volume of online videos continues to grow exponentially, demand for solutions to efficiently search and gain insights from video continues to grow as well. Tesla T4 delivers breakthrough performance for AI video applications, with dedicated hardware transcoding engines that bring twice the decoding performance of prior-generation GPUs. T4 can decode up to 38 full-HD video streams, making it easy to integrate scalable deep learning into video pipelines to deliver innovative, smart video services.

NVIDIA Tesla T4 Specifications

 

Performance
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Turing Tensor Cores
320

NVIDIA CUDA® cores
2,560

Single Precision Performance (FP32)
8.1 TFLOPS

Mixed Precision (FP16/FP32)
65 FP16 TFLOPS

INT8 Precision
130 INT8 TOPS

INT4 Precision
260 INT4 TOPS

 
 

Interconnect
 
 

Gen3
x16 PCIe

 
 

Memory
 
 
 
 
 

Capacity
16 GB GDDR6

Bandwidth
320+ GB/s

 
 

Power
 

75 watts

 

NVIDIA AI Inference Platform

Explore the World's Most Advanced Inference Platform.