Powering New Levels
of User Engagement

Boost Throughput and Responsive Experience in Deep Learning Inference Workloads.

AI is constantly challenged to keep up with exploding volumes of data and still deliver fast responses. Meet the challenges with NVIDIA® Tesla®, the world’s fastest, most efficient data center platform for inference. Tesla supports all deep learning workloads and provides the optimal inference solution—combining the highest throughput, best efficiency, and best flexibility to power AI-driven experiences. TensorRT unlocks performance of Tesla GPUs across a variety of applications such as video-streaming, speech and recommender systems and provides a foundation for the NVIDIA DeepStream SDK.

INFERENCE SUCCESS STORIES

iFLYTEK

iFLYTEK’s Voice Cloud Platform uses NVIDIA Tesla P4 and P40 GPUs for training and inference, to increase speech recognition accuracy.

VALOSSA

NVIDIA Inception Program startup Valossa is using NVIDIA GPUs to accelerate deep learning and divine viewer behavior from video data.

JD.COM

JD uses NVIDIA AI inference platform to achieve 40X increase in video detection efficiency.

NVIDIA DATA CENTER INFERENCE PLATFORMS

TESLA V100
For Universal Data Centers

The Tesla V100 has 125 teraflops of inference performance per GPU. A single server with eight Tesla V100s can produce a petaflop of compute.

TESLA P4
For Ultra-Efficient Scale-Out Servers

The Tesla P4 accelerates any scale-out server, offering an incredible 60X higher energy efficiency compared to CPUs.

TESLA P40
For Inference-Throughput Servers

The Tesla P40 offers great inference performance, INT8 precision and 24GB of onboard memory for an amazing user experience.

NVIDIA DATA CENTER COMPUTE SOFTWARE

NVIDIA TensorRT

NVIDIA TensorRT™ is a high-performance, neural-network inference accelerator that can speed up applications such as recommenders, speech recognition, and machine translation by 100x compared to CPUs. TensorRT provides developers the capabilities to optimize neural network models, calibrate for lower precision with high accuracy, and deploy the models to production environments in enterprise and hyperscale data centers.

DeepStream SDK

NVIDIA DeepStream for Tesla is an SDK for building deep learning-based scalable intelligent video analytics (IVA) applications for smart cities and hyperscale data centers. It brings together NVIDIA TensorRT for inference, Video Codec SDK for transcode , pre-processing, and data curation APIs to tap into the power of Tesla GPUs. On the Tesla P4 GPUs, for example, you can simultaneously decode and analyze up to 30 HD video streams in real time.

Kubernetes on NVIDIA GPUs

Kubernetes on NVIDIA GPUs enables enterprises to scale up training and inference deployment to multi-cloud GPU clusters seamlessly. With Kubernetes, GPU-accelerated deep learning and HPC applications can be deployed to multi-cloud GPU clusters instantly.

FEATURES AND BENEFITS

50X Higher Throughput to Keep Up with Expanding Workloads

Volta-powered Tesla V100 GPUs give data centers a dramatic boost in throughput for deep learning workloads to extract intelligence from today’s tsunami of data. A server with a single Tesla V100 can replace up to 50 CPU-only servers for deep learning inference workloads, so you get dramatically higher throughput with lower acquisition cost.

Unprecedented Efficiency for Low-Power, Scale-Out Servers

The ultra-efficient Tesla P4 GPU accelerates density-optimized, scale-out servers with a small form factor and a 50/75 W power footprint design. It delivers an incredible 52X better energy efficiency than CPUs for deep learning inference workloads, so hyperscale customers can scale within their existing infrastructure and service the exponential growth in demand for AI-based applications.

A Dedicated Decode Engine for New
AI-Based Video Services

The Tesla P4 GPU can analyze up to 39 HD video streams in real time. Powered by a dedicated hardware-accelerated decode engine, it works in parallel with the NVIDIA CUDA® cores performing inference. By integrating deep learning into the pipeline, customers can offer new levels of smart, innovative functionality that facilitates video search and other video-related services.

Faster Deployment with NVIDIA TensorRT and DeepStream SDK

NVIDIA TensorRT is a high-performance, neural-network inference engine for production deployment of deep learning applications. With TensorRT, neural nets trained in 32-bit or 16-bit data can be optimized for reduced-precision INT8 operations on Tesla P4 or FP16 on Tesla V100. NVIDIA DeepStream SDK taps into the power of Tesla GPUs to simultaneously decode and analyze video streams.

Performance Specs

Tesla V100: The Universal Data Center GPU Tesla P4 for Ultra-Efficient, Scale-Out Servers Tesla P40 for Inference-Throughput Servers
Single-Precision Performance (FP32) 14 teraflops (PCIe)
15.7 teraflops (SXM2)
5.5 teraflops 12 teraflops
Half-Precision Performance (FP16) 112 teraflops (PCIe)
125 teraflops (SXM2)
Integer Operations (INT8) 22 TOPS* 47 TOPS*
GPU Memory 16 GB HBM2 8 GB 24 GB
Memory Bandwidth 900 GB/s 192 GB/s 346 GB/s
System Interface/Form Factor Dual-Slot, Full-Height PCI Express Form Factor SXM2 / NVLink Low-Profile PCI Express Form Factor Dual-Slot, Full-Height PCI Express Form Factor
Power 250 W (PCIe)
300 W (SXM2)
50 W/75 W 250 W
Hardware-Accelerated Video Engine 1x Decode Engine, 2x Encode Engines 1x Decode Engine, 2x Encode Engines

*Tera-Operations per Second with Boost Clock Enabled

OPTIMIZE YOUR DEEP LEARNING INFERENCE SOLUTION TODAY.

The Tesla V100, P4, and P40 are available now for deep learning inference.