Deep Learning Inference Platform Inference Software and Accelerators for Cloud, Data Center, Edge, and Autonomous Machines
Unleash The Full Potential of NVIDIA GPUs with NVIDIA TensorRT NVIDIA® TensorRT™ is a high-performance inference platform that is key to unlocking the power of NVIDIA Tensor Core GPUs. It delivers up to 40X higher throughput while minimizing latency compared to CPU-only platforms. Using TensorRT, you can start from any framework and rapidly optimize, validate, and deploy trained neural networks in production. TensorRT is also available on the NVIDIA NGC catalog.
Simplify Deployment with the NVIDIA Triton Inference Server The NVIDIA Triton Inference Server, formerly known as TensorRT Inference Server, is an open-source software that simplifies the deployment of deep learning models in production. The Triton Inference Server lets teams deploy trained AI models from any framework (TensorFlow, PyTorch, TensorRT Plan, Caffe, MXNet, or custom) from local storage, the Google Cloud Platform, or AWS S3 on any GPU- or CPU-based infrastructure. It runs multiple models concurrently on a single GPU to maximize utilization and integrates with Kubernetes for orchestration, metrics, and auto-scaling. Learn More
Power Unified, Scalable Deep Learning Inference With one unified architecture, neural networks on every deep learning framework can be trained, optimized with NVIDIA TensorRT , and then deployed for real-time inferencing at the edge. With NVIDIA DGX™ Systems , NVIDIA Tensor Core GPUs , NVIDIA Jetson™ , and NVIDIA DRIVE™ , NVIDIA offers an end-to-end, fully scalable deep learning platform, as shown in the MLPerf benchmark suite.
See Cost Savings on a Massive Scale To keep servers at maximum productivity, data center managers must make tradeoffs between performance and efficiency. A single NVIDIA T4 server can replace multiple commodity CPU servers for deep learning inference applications and services, reducing energy requirements and delivering both acquisition and operational cost savings.