NVIDIA Triton™, part of the NVIDIA® AI platform, offers a new functionality called Triton Management Service (TMS) that automates the deployment of multiple Triton Inference Server instances in Kubernetes with resource-efficient model orchestration on GPUs and CPUs. This software application manages deployment of Triton Inference Server instances with one or more AI models, allocates models to individual GPUs/CPUs, and efficiently collocates models by frameworks. Triton Management Service enables large-scale inference deployment with high performance and hardware utilization. TMS, available exclusively with NVIDIA AI Enterprise, an enterprise-grade AI software platform, enables large-scale inference deployment with high performance and hardware utilization.