All Industries
FastLabel, an AI startup based in Japan, sought to automate and scale the curation of computer vision datasets to support industries like autonomous driving, manufacturing, and smart infrastructure. To maintain fast-paced AI development, FastLabel’s central goal was to eliminate manual bottlenecks typically associated with preparing large, high-quality image data, ensuring the process was both fast and cost-efficient while maintaining low data redundancy.
FastLabel
Generative AI / LLMs
NVIDIA NeMo
Key Takeaways
High-quality image dataset preparation at scale is challenging because conventional methods miss subtle redundancies, forcing time-consuming manual reviews and inefficient use of resources.
Prior to implementing NVIDIA solutions, FastLabel grappled with slow, resource-intensive processes for image filtering and deduplication, particularly for long-tail datasets required by sectors such as autonomous driving. Traditional rule-based tools struggled to identify redundancy based on semantic similarity, resulting in wasteful cycles of repetitive data and reducing overall productivity.
To overcome these issues, FastLabel needed a robust and scalable technique for pinpointing and removing redundant data, especially leveraging advances in generative AI models rather than conventional heuristics. With the ability to automatically remove duplicate data within minutes rather than hours, NVIDIA NeMo™ Curator stood out as an ideal choice for scaling data processing pipelines—a significant improvement over previous manual and time-consuming approaches.
FastLabel
FastLabel implemented the NeMo Curator image processing features on NVIDIA A100 GPUs on Google Cloud Platform (GCP), paired with GCP-hosted ISV models for image embedding and caption generation. The company used this solution to curate large-scale autonomous driving image datasets, enabling the creation of clean datasets that exclude semantically similar images in a scalable way.
The key innovation involved integrating vision language models (VLMs) that generate detailed captions for each image based on domain-specific predefined features. These captions are then embedded and processed through NeMo Curator semantic deduplication feature, allowing for highly targeted, domain-specific curation that would be difficult to achieve using general image semantic similarity methods.
The adoption of the NVIDIA NeMo Curator brought transformative results for FastLabel.
This efficiency not only accelerated dataset preparation but also reduced computational waste, supporting more sustainable AI training across FastLabel projects.
“Before implementing NVIDIA-powered solutions, deduplicating images for autonomous driving was a resource bottleneck.NVIDIA NeMo Curator enabled us to automate and scale our dataset curation, dramatically reducing costs and manual effort. We deduplicated 10,000 images in just minutes and identified hundreds of duplicates that traditional methods would have missed. This not only accelerates our AI projects but lets us deliver immediate, high-quality data to customers in safety-critical industries.”
Shuhei Uchida
CPO
For FastLabel, the solution enabled the launch of its “FastLabel Data Curation” service, providing customers with rapid, reliable access to high-quality, automatically tagged, and deduplicated datasets. This dramatically reduced the time required for manual reviews and accelerated downstream project cycles. These innovations empowered safer, more scalable deployment of AI solutions in safety-critical domains like autonomous driving.
FastLabel Data Curation plans to launch a high-speed, scalable curation service for text data in addition to images, leveraging NeMo Curator to provide customers with high-quality data for LLMs and VLMs.
On a larger scale, FastLabel’s approach—enabled by NVIDIA technologies—embodies a move toward sustainable, large-scale artificial intelligence, helping organizations globally to create and maintain better datasets faster, and at a lower cost.
NVIDIA NeMo Curator improves generative AI model accuracy by processing text, image, and video data at scale for training and customization.