Baseten leverages NVIDIA GPUs and NVIDIA® TensorRT™-LLM to provide machine learning infrastructure that’s high-performance, scalable, and cost-effective.
Baseten
Baseten
Generative AI / LLMs
NVIDIA TensorRT-LLM
NVIDIA A100 Tensor Core GPU
NVIDIA A10 Tensor Core GPU
Baseten’s mission is simple: provide machine learning (ML) infrastructure that just works.
With Baseten, organizations have what they need to deploy and serve ML models performantly, scalably, and cost-effectively for real-time applications. Customers can come to Baseten with their own models or choose from a variety of pretrained models and deploy them in production, served on Baseten’s open-source Truss framework and managed on an easy-to-use dashboard.
Leveraging NVIDIA GPU-accelerated instances on AWS, such as Amazon EC2 P4d instances powered by NVIDIA A100 Tensor Core GPUs, and optimized NVIDIA software, such as NVIDIA TensorRT-LLM, Baseten can deliver on their mission from the cloud.
Image courtesy of Baseten
Image courtesy of Baseten
Baseten tackles several model-deployment challenges faced by their customers, specifically around scalability, cost efficiency, and expertise.
Scalability: Handling AI infrastructure that serves varying levels of demand, from sporadic individual requests to thousands of high-traffic requests, is a big challenge. The underlying infrastructure must be both dynamic and responsive, adapting to real-time demands without causing delays or needing manual oversight.
Cost Efficiency: Maximizing the utilization of the underlying NVIDIA GPUs is critical. AI inference infrastructure needs to deliver high performance without incurring unnecessary expenses during low- and high-traffic scenarios.
Expertise: The deployment of ML models requires specialized skills and a deep understanding of the underlying infrastructure. This expertise can be scarce and costly to acquire, presenting a challenge for organizations to maintain cutting-edge inference capabilities without a significant investment in skilled personnel.
Baseten offers optimized inference infrastructure powered by NVIDIA’s hardware and software to help solve the challenges of deployment scalability, cost efficiency, and expertise.
With automatic scaling capabilities, Baseten allows customers deploying their models to dynamically adjust the number of replicas based on consumer traffic and service-level agreements, ensuring that capacity meets demand without manual intervention. This helps optimize for cost, as Baseten’s infrastructure can easily scale up or down depending on the number of requests coming to the model. Not only does it cost customers nothing when there's no activity, but once a request does come in, Baseten’s infrastructure, powered by NVIDIA GPUs on AWS EC2 instances powered by NVIDIA A100 Tensor Core GPUs, only takes 5–10 seconds to get the model up and running. This is an incredible speedup on cold starts, which previously took up to five minutes—a speedup of 30–60X. Customers can also choose from a variety of NVIDIA GPUs available on Baseten to accelerate their model inference, including but not limited to NVIDIA A100, A10G, T4, and V100 Tensor Core GPUs.
On top of NVIDIA hardware, Baseten leverages optimized NVIDIA software. Using the TensorRT-LLM feature of tensor parallelism served on AWS, Baseten boosted inference performance for a customer's LLM deployment by 2X through their open-source Truss framework. Truss is Baseten’s open-source packaging and deployment library, which lets users deploy models in production with ease.
TensorRT-LLM is included as a part of NVIDIA AI Enterprise, which provides a production-grade, secure, end-to-end software platform for enterprises building and deploying accelerated AI software.
NVIDIA’s full-stack AI inference approach plays a crucial role in meeting the stringent demands of Baseten’s customers’ real-time applications. With NVIDIA A100 GPUs andTensorRT-LLM optimizations, the underlying infrastructure unlocks both performance gains and cost savings for developers.
Explore more about Baseten by watching a quick demo of their product.
Baseten is a member of NVIDIA Inception, a free program that nurtures startups revolutionizing industries with technological advancements. As a benefit of Inception, Baseten gained early access to TensorRT-LLM, presenting a significant opportunity to develop and deliver high-performance solutions.
Join NVIDIA Inception’s global network of over 15,000 technology startups.