Public Sector

Using AI and Accelerated Computing to Root Out Waste, Fraud, and Theft

Objective

Making reading assessments more efficient using voice recognition to help kids grow their love for learning so that they can build a brighter future.

Customer

Internal Revenue Service

Partner

Cloudera

Use Case

Data Science

Technology

NVIDIA AI Enterprise
NVIDIA RAPIDS

The IRS Is Leveraging AI Tools, Machine Learning, and Fraud Detection Applications Accelerated by NVIDIA GPUs

Like every other industry, the government’s data throughput requirements have grown exponentially. Compounding the challenge of managing expanding data needs, government agencies must carry out their work while efficiently rooting out waste, fraud, and abuse to ensure the ethical use of taxpayer dollars.

The Government Accountability Office (GAO) recently identified 36 operations that need to be transformed to keep up with data management requirements, including high-risk areas that affect the nation’s commerce, economy, and security. 

Without adequate IT infrastructure, government agencies have struggled to efficiently explore and parse large bodies of data, making frequent human intervention necessary. This makes it difficult for agencies to effectively execute the data-driven operations necessary to maintain public trust.

To overcome these challenges, the IRS is leveraging AI tools accelerated by NVIDIA infrastructure ,machine learning, and fraud detection applications.

Fraud Detection Applications Accelerated by NVIDIA GPUs

CPUs and Manual Efforts Come Up Short

To combat tax fraud and uncover bad actors, IRS investigators must analyze decades’ worth of data, link individuals to suspicious transactions, and trace transactions through multiple steps and multiple hops on a graph. 

With this mission, one IRS data scientist was tasked with combing through a 3+ terabyte dataset and identifying patterns to expose fraud. Unfortunately, the available compute power was insufficient. Running the job all night on a large bank of CPUs, the job failed to complete. The team attempted to break down the datasets, server by server, but were forced to manually stitch data subsets together to make the solution work. Even with all of the careful manual effort, it wasn’t possible to achieve full visibility into real-time fraud detection. 

To improve data-centric tasks like this, the IRS is implementing high-powered AI tools, machine learning, and applications capable of swiftly exposing fraud and identity theft. 

20X Speedups Helped the IRS Expose Fraud

The new combination of computing infrastructure and software solutions enabled the IRS to quickly and easily implement AI and machine learning at scale. With Cloudera running on NVIDIA GPUs, workloads immediately ran up to 5X faster with no code changes. But there was still room for improvement.

Cloudera called on a team of  NVIDIA data scientists to examine the IRS code. They determined that a few tasks with particularly complex data structures were still running on CPUs. NVIDIA wrote new code to handle those jobs and inserted it into Spark’s software interface for NVIDIA RAPIDS™, the open library for running data analytics on GPUs.

When the IRS team ran the new code on GPUs in a distributed Spark cluster, they experienced a remarkable speedup of 20X. 

By developing workloads that use Apache Spark and graph analysis, engineering teams created immense graphs with nodes and edges. With AI bots and machine learning algorithms analyzing graphs, investigators were able to connect individuals to institutions and, subsequently, to larger entities spanning years and decades. These insights helped to quickly expose patterns that indicated fraud.

The same datasets that used to take weeks or months to stitch together and process now take only hours or minutes. Testing revealed a 10X improvement in engineering and data science workflows with a 50 percent reduction in infrastructure costs. 

Building on Success to Better Protect Taxpayers

With improved computing infrastructure and AI implementation, the IRS is cutting costs and better protecting taxpayers by preventing fraud and identity theft. 

Building on their success in data preparation and data analytics, the IRS plans to accelerate AI inference jobs and use Spark-GPU infrastructure to tackle natural language processing and other analytics jobs. 

Across government, there are innumerable opportunities to improve performance with AI and accelerated computing. Other government agencies that track transactions to mitigate waste, theft, and fraud can follow the IRS’s example and modernize infrastructure and software to attain a higher standard of operational efficiency and public service. 

“The Cloudera and NVIDIA integration will empower us to use data-driven insights to power mission-critical use cases. We’re currently implementing this integration, and are already seeing over 20x speed improvements at half the cost for our data engineering and data science workflows.”

Joe Asaldi
Technical Branch Chief of Research and Applied Analytics and Statistics, IRS

Results

  • 20X speedup in running data scientists’ experiments

  • 50 percent lower cost of data science and data engineering workflows

Keep Learning

Take a closer look at how NVIDIA is helping to accelerate innovation in the public sector.