Tesla

Subscribe
POWER NEW LEVELS OF USER ENGAGEMENT
Boost throughput and responsive experiences in deep learning inference workloads.
Boost throughput and responsive experiences in deep learning inference workloads

ACCELERATE DEEP LEARNING INFERENCE

In the new era of artificial intelligence (AI), deep learning is enabling superhuman accuracy in complex tasks to enhance our everyday experiences. Interactive speech, computer vision, and predictive analytics are a few of the areas where deep learning models trained on GPUs have demonstrated incredible results that were previously thought impossible.

AI-powered services are constantly challenged to keep up with exploding volumes of data, and still deliver fast responses. A server with a single Tesla GPU can deliver 33x more inference throughput than a single-socket CPU-only server. This massive acceleration translates into huge cost savings for data centers allowing them to scale and meet the ever-growing demand for AI-driven services.

Additionally, responsiveness is critical to user adoption for services like visual search, personalized recommendations, and automated customer service. As deep learning models increase in accuracy, size and complexity, CPUs struggle to deliver interactive user experience. Tesla GPUs powered by Pascal deliver up to 31x better latency with much higher throughput versus CPU-only servers to ensure the responsiveness needed to serve up AI-based experiences.

NVIDIA® Tesla® P100 and P4 GPU accelerators give you the optimal solution—combining the highest throughput and lowest latency on deep learning inference workloads to power new AI-driven experiences.

 

NVIDIA TESLA INFERENCE ACCELERATORS

Deep Learning Inference Performance

Deep Learning Inference Throughput

 
NVIDIA Tesla P100

MAXIMUM DEEP LEARNING INFERENCE THROUGHPUT

The Tesla P100 is the universal data center GPU, delivering game-changing performance on HPC, deep learning, and remote graphics. It delivers massive throughput for deep learning training and inference. With 18.7 TeraFLOPS of inference performance per GPU, a single server with eight Tesla P100s can replace over 200 CPU servers.

 

ULTRA-EFFICIENT DEEP LEARNING IN SCALE-OUT SERVERS

The Tesla P4 accelerates any scale-out server, offering an incredible 60X higher energy efficiency compared to CPUs.

Pdf
Tesla P4 Datasheet (PDF – 164KB)
Tesla P4
 

DEEP LEARNING ACCELERATOR FEATURES AND BENEFITS

These GPUs power faster predictions that enable amazing user experiences for AI applications.

 
100X Higher Throughput to Keep Up with Expanding Data

33x Higher Throughput to Keep Up with Expanding

The volume of data generated every day in the form of sensor logs, images, videos, and records is economically impractical to process on CPUs. Pascal-powered GPUs give data centers a dramatic boost in throughput for deep learning workloads to extract intelligence from this tsunami of data. A server with single Tesla P100 can replace up to 27 CPU-only servers for deep learning inference workloads, so you get dramatically higher throughput with lower acquisition cost.

 
A Dedicated Decode Engine for New AI-based Video Services

A Dedicated Decode Engine for New AI-based Video Services

The Tesla P4 GPU can analyze up to 39 HD video streams in real time, powered by a dedicated hardware-accelerated decode engine that works in parallel with the NVIDIA® CUDA® cores performing inference. By integrating deep learning into the video pipeline, customers can offer new levels of smart, innovative video services that facilitate video search and other video-related services.

Unprecedented Efficiency for Low-Power Scale-out Servers

Unprecedented Efficiency for Low-Power Scale-out Servers

The ultra-efficient Tesla P4 GPU accelerates density-optimized scale-out servers with a small form factor and
50/75 W power footprint design. It delivers an incredible 60X better energy efficiency than CPUs for deep learning inference workloads so that hyperscale customers can scale within their existing infrastructure and service the exponential growth in demand for AI-based applications.



 
Faster Deployment With NVIDIA TensorRT™ and DeepStream SDK

Faster Deployment With NVIDIA TensorRT™ and DeepStream SDK

NVIDIA TensorRT is a high-performance neural network inference engine for production deployment of deep learning applications. It includes libraries to streamline deep learning models for production deployment, taking trained neural nets—usually in 32-bit or 16-bit data—and optimizing them for reduced-precision INT8 operations on Tesla P4, or FP16 on Tesla P100. NVIDIA DeepStream SDK taps into the power of Pascal GPUs to simultaneously decode and analyze video streams.

 

PERFORMANCE SPECIFICATIONS FOR NVIDIA TESLA P4, P40 and P100 ACCELERATORS

 
  Tesla P4 for Ultra-Efficient Scale-Out Servers Tesla P40 for Maximum-Inference Throughput Servers Tesla P100: The Universal Datacenter GPU
Single-Precision Performance (FP32) 5.5 TeraFLOPS 12 TeraFLOPS 10.6 TeraFLOPS
Half-Precision Performance (FP16) -- -- 21 TeraFLOPS
Integer Operations (INT8) 22 TOPS* 47 TOPS* --
GPU Memory 8 GB 24 GB 16 GB
Memory Bandwidth 192 GB/s 346 GB/s 732 GB/s
System Interface Low-Profile PCI Express Form Factor Dual-Slot, Full-Height PCI Express Form Factor Dual-Slot, Full-Height PCI Express Form Factor, or SXM2 Form Factor with NVLink
Power 50 W/75 W 250 W 250 W (PCIe)
300W (SXM2)
Hardware-Accelerated Video Engine 1x Decode Engine, 2x Encode Engines 1x Decode Engine, 2x Encode Engines --

*Tera-Operations per Second with Boost Clock Enabled

NVIDIA TESLA PRODUCT LITERATURE

Pdf
Pdf
Tesla P40 Data Sheet (PDF – 166KB)
Pdf
Tesla P4 Data Sheet (PDF – 164KB)
 
 

Get the NVIDIA Tesla P100 and P4 Today

The Tesla P100 and P4 are available now for deep learning inference.

WHERE TO BUY