Video Analytics AI Agents

Video analytics AI agents are AI-powered assistants that can see, reason, and act on live or recorded video streams. They use vision language models and large language models to help search, summarize, and understand video through natural language.

Explore Vision AI

Overview
Technical Implementation
FAQs
Get Started
Resources
Deploy AI Agents
Customer Stories

Overview
Technical Implementation
FAQs
Get Started
Resources
Deploy AI Agents
Customer Stories

Explore Vision AI

Workloads

Computer Vision / Video Analytics

Industries

Manufacturing
Smart Cities/Spaces
Retail/ Consumer Packaged Goods
Media and Entertainment
Healthcare and Life Sciences

Business Goal

Return on Investment
Innovation

Products

Overview

How Do AI Agents Improve on Traditional Video Analytics?

Traditional video analytics applications and their development workflows are typically built on fixed-function, limited models that are designed to see and identify only a select set of predefined objects. With generative AI and foundation models, you can now build applications with fewer models that have incredibly complex and broad perception and rich contextual understanding. This new generation of vision language models (VLMs), such as NVIDIA Cosmos™, is giving rise to smart, powerful video analytics AI agents.

What Is a Video Analytics AI Agent?

A video analytics AI agent can see, reason, and act by combining vision and language modalities to understand a broad range of natural language questions or prompts applied to a recorded or live video stream. This deeper understanding of video content enables more accurate and meaningful interpretations, improving the functionality of video analytics applications and analysis of real-world scenarios. These agents promise to unlock entirely new insights and possibilities for automation.

Where Are Video Analytics AI Agents Deployed?

Highly perceptive, accurate, and interactive video analytics AI agents will be deployed throughout factories, warehouses, retail stores, airports, traffic intersections, and more. This will have a tremendous impact on operations teams looking to make safer spaces and better decisions using richer insights generated from natural interactions. Managers and operations teams will also communicate with these agents in natural language, all powered by generative AI and VLMs with NVIDIA NIM™ microservices at their core.

Build Video Analytics AI Agents

Explore the reference workflow, powered by multiple visual language models, and easily build your video analytics agent.

Explore the AI Blueprint

Quick Links

NVIDIA Factory Operations Blueprint Gives Factories a New AI Brain

3 Ways to Bring Agentic AI to Computer Vision Applications

NVIDIA, T-Mobile and Partners Integrate Physical AI Applications on AI-RAN- Ready Infrastructure

Watch: Building Smart Cities with Digital Twins and Agentic AI

Read: Kaohsiung City Uses Vision AI to Optimize City Operations

Technical Implementation

Develop With NVIDIA Cosmos

The brain inside every video analytics AI agent is a VLM that can see and reason. Two common VLMs are NVIDIA Cosmos 3 and Cosmos Embed. Both can be used to augment current computer vision applications with rich metadata and content summaries.

NVIDIA NIM is a set of accelerated inference microservices that are optimized for NVIDIA GPUs and include industry standard APIs, domain-specific code, optimized inference engines, and enterprise runtime. It delivers a combination of VLMs, large language models (LLMs), and retrieval-augmented generation (RAG) for building your video analytics AI agent that can process live or archived images or videos to extract actionable insights using natural language. We’ve created a reference workflow of a video analytics AI agent that you can try out to accelerate your development process.

Quick Links

Download NVIDIA Cosmos NIM

Learn More About Cosmos Cookbook

Deep dive into Cosmos 3

Read: Industry Pioneers Build Smarter Agents With NVIDIA Nemotron and Cosmos Reasoning Models

Try the Video Analytics AI Agent Reference Workflow

Read: Build Multimodal Video Analytics AI Agents Powered by NVIDIA NIM

Build AI Agents With NVIDIA Metropolis VSS Blueprint and Skills

The NVIDIA Metropolis Blueprint for video search and summarization (VSS) makes it easy to build and customize video analytics AI agents using generative AI, VLMs, LLMs, RAG, and NVIDIA NIM. The video analytics AI agents are given tasks through natural language and can analyze, interpret, and process vast amounts of video data to provide critical insights that help a range of industries optimize processes, improve safety, and cut costs.

VSS delivers modularized components that enable high flexibility, accelerated microservices that support real-time video intelligence, agentic fusion search across diverse embeddings, and comprehensive report generation capabilities. It also provides agent skills and tools that let developers build video analytics AI agents with simple natural language prompts and coding agents.

VSS also enables seamless integration of generative AI into existing computer vision pipelines—enhancing inspection, search, and analytics with multimodal understanding and zero-shot reasoning. VSS is easily deployed from the edge to the cloud on platforms including NVIDIA RTX™ 4500, NVIDIA RTX PRO™ 6000, NVIDIA DGX Spark™, and NVIDIA® Jetson Thor™.

Quick Links

Try VSS Skils

Try the Blueprint on Cloud With Launchable

Watch the Tutorial: How to Build a Video Search AI Agent with NVIDIA VSS Skills and NemoClaw

Read the Blog: Turn Hours of Video into Searchable Insights with NVIDIA Metropolis VSS Blueprint

Watch the Recording: Build Video Analytics AI Agents With Skills

Read the Blog: How to Integrate Computer Vision Pipelines With Generative AI and Reasoning

Watch the Video: Get Context-Rich Insights on Alerts with VLMs

Improve Accuracy With Model Fine-Tuning Synthetic Data Generation Agent Skills

Traditional approaches to customizing models for video analytics AI agents were linear and slow—collect video, label frames, train, evaluate, repeat—with a human in the loop at every step and months to reach acceptable accuracy. Modern approaches break this cycle by enabling coding agents to iteratively improve VLM and vision foundation model performance based on target goals.

Fine-tune vision language models with NVIDIA TAO agent skills.

NVIDIA TAO is a suite of agent skills and tools for fine-tuning vision AI models with natural language prompts. Coding agents use these tools and skills to autonomously hit model accuracy targets by iteratively evaluating model accuracy, determining the precise training data needed, and then mining existing data or synthetically generating needed data.

Solve training data challenge with agent skills for synthetic data generation.

When training data is limited, developers can quickly generate synthetic defect data for visual inspection or augment videos for different scenarios such as weather, lighting, and more.

Quick Links

Get Started with NVIDIA TAO Skills

Try Agent Skill for Defect Image Generation

Try Agent Skill for Video Augmentation

Create Edge Agents With Jetson Platform Services

You can build video analytics AI agents powered by the NVIDIA Jetson™ edge AI platform using the newest feature of NVIDIA JetPack™—Jetson Platform Services. The generative AI application is completely running on an NVIDIA Jetson Orin™ device that’s capable of detecting events to generate alerts and facilitate interactive Q&A sessions.

Quick Links

Download the Reference Workflow for Jetson

Tech Blog : Develop Generative AI-Powered Video Analytics AI Agents for the Edge

FAQs

Yes, you can now build video analytics AI agents faster from simple natural language prompts using VSS skills with coding agents like Codex and Claude. Explore a suite of VSS skills in github.

A NIM is a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing across the cloud, data center, and workstations. It supports a wide range of AI models, including open source community and NVIDIA AI foundation models to ensure seamless, scalable AI inferencing — on premises or in the cloud — using industry-standard APIs. All NIM microservices and associated preview APIs can be found at build.nvidia.com.

Visit build.nvidia.com to start exploring the NVIDIA Metropolis VSS Blueprint and available NIM microservices such as Cosmos 3 Reasoner NIM.

All users can get started for free with the preview APIs on build.nvidia.com. Each new account can receive up to 5,000 credits to try out the APIs. To continue development after credits run out, you can deploy the downloadable NIM microservices locally to your hardware or to a cloud instance. Developers can also access NIM via the NVIDIA Developer Program. See details in this FAQ.

NVIDIA NIMs is free for developers to try out. To go to production, downloadable NIM microservices require an NVIDIA AI Enterprise License. To learn more, visit this page.

The NIM developer forum is the best place to ask questions and engage with our developer community. You can access the forums here.

Get Started

Build Video Analytics AI Agents

Explore the reference workflow, powered by multiple visual language models, to easily build your video analytics AI agent.

Try Vision Language Models

Explore VSS Blueprint

Developer Guides: Build a Video Analytics AI Agent

Tech Blog
GTC On-Demand Videos

Turn Hours of Video Into Searchable Insights With AI Agents

Learn how to deploy video agents using VSS skills for real-time intelligence alerts and agentic search.

Read the Blog

Build Advanced Video Analytics AI Agents

Learn how to seamlessly build a video analytics AI agent using NVIDIA AI Blueprint for video search and summarization (VSS).

Read the Blog Part 1 Read the Blog Part 2

Augment Computer Vision Pipelines With Generative AI

Explore the new features of the latest VSS 2.4, including event verification, integrating with Cosmos Reason, and expanded hardware support.

Read the Blog

Build an Agentic Video Workflow

Learn how to build a workflow with audio input, speech output for video search, and summarization.

Read the Blog

Build Real-Time Multimodal XR Apps

Learn how to use NVIDIA AI Blueprint for video search and summarization to support audio in an XR environment.

Read the Blog

View All VLM Tech Blogs

See All GTC On-Demand Videos

Deploy AI Agents From Edge to Cloud

Tap into the power of the VSS blueprint to deploy AI agents seamlessly from edge to cloud, with scalable performance across a diverse range of GPUs.

NVIDIA RTX PRO 6000 Blackwell Series GPUs

NVIDIA RTX PRO 6000 Blackwell Series GPUs accelerate physical AI by running every robot development workload across training, synthetic data generation, robot learning, and simulation.

Explore RTX PRO 6000

NVIDIA Jetson Thor

Accelerate the future of physical AI and robotics with NVIDIA Jetson Thor™ series modules that deliver up to 2070 FP4 TFLOPS of AI compute and 128 GB of memory—all in a compact form factor.

Learn About Jetson Thor

NVIDIA DGX Spark

NVIDIA DGX Spark brings the power of NVIDIA Grace Blackwell to developer desktops. The NVIDIA GB10 Superchip, combined with 128 GB of unified system memory, lets AI researchers, data scientists, and students work with AI models locally with up to 200 billion parameters.

Learn About DGX Spark

Video Analytics AI Agents

How Do AI Agents Improve on Traditional Video Analytics?

What Is a Video Analytics AI Agent?

Where Are Video Analytics AI Agents Deployed?

Build Video Analytics AI Agents

Develop With NVIDIA Cosmos

Build AI Agents With NVIDIA Metropolis VSS Blueprint and Skills

Improve Accuracy With Model Fine-Tuning Synthetic Data Generation Agent Skills

Create Edge Agents With Jetson Platform Services

Can I build video analytics AI agents with skills?

What is a NIM microservice?

How do I get started with VLMs and the NVIDIA Metropolis VSS Blueprint?

How do I get credits for build.nvidia.com?

Do I have to pay to use a downloadable NIM?

How can I get technical support when prototyping with NIM microservices?

Get Started

Build Video Analytics AI Agents

Developer Guides: Build a Video Analytics AI Agent

Turn Hours of Video Into Searchable Insights With AI Agents

Build Advanced Video Analytics AI Agents

Augment Computer Vision Pipelines With Generative AI

Build an Agentic Video Workflow

Build Real-Time Multimodal XR Apps

Deploy AI Agents From Edge to Cloud

NVIDIA RTX PRO 6000 Blackwell Series GPUs

NVIDIA Jetson Thor

NVIDIA DGX Spark

Related Success Stories