Healthcare and Life Sciences

Unlocking the Mysteries of Mutational Signatures of Cancer with NVIDIA Accelerated Solutions

Objective

Sanger Institute uses the NVIDIA DGX server to power its mutational cancer signature analysis pipeline—improving performance by 30x.

Customer

Sanger Institute

Use Case

Performance Improvement

Technology

NVIDIA DGX-1™ Server, NVIDIA® NVLink®

The Need to Better Understand Mutational Signatures of Cancer

Cancer is caused by damage to cells’ DNA known as somatic mutations. This damage can be the result of behaviors such as smoking and drinking alcohol, as well as environmental factors such as ultraviolet light and exposure to radiation.

Damage to DNA occurs in specific patterns known as “mutational signatures,” which are unique to the factor that caused the damage. For example, although tobacco and ultraviolet radiation both cause cancer by producing mutations, the signature caused by smoking tobacco is found in lung cancer while the signature from ultraviolet light exposure is found in skin cancer.

Many cancer-associated mutational signatures have been identified, but only about half of them have known causes. In recent years, the analysis of DNA from cancers has led to more than ninety different mutational signatures being discovered. However, the environmental, lifestyle, genetic, or other potential causes of many of these mutational signatures are still unknown.

As part of the Cancer Grand Challenges Mutographs team funded by Cancer Research UK (CRUK), the Wellcome Sanger Institute, one of the premier centers of genomic discovery and understanding in the world, is using NVIDIA GPU-accelerated machine learning models to help understand how naturally occurring DNA changes affect cancer.

The goal of the computational component of the project is to elucidate the causes of major global geographical and temporal differences in cancer incidences through the study of mutational signatures. Identifying a broader set of mutational signatures will go a long way toward understanding the correlations between them and their causes, ultimately leading to more precise cancer treatments

lungs

Wellcome Sanger Institute researcher conducts DNA sequencing. Image courtesy of Wellcome Sanger Institute.

OSCC-map

Cases of esophageal squamous cell carcinoma vary greatly around the world. Image courtesy of the Mutographs project. Data source: GLOBOCAN 2012.

Cracking the Code with GPU-Accelerated Computing

This work requires the solution of a computationally intensive machine learning problem known as non-negative matrix factorization (NMF). Ludmil Alexandrov developed the approach for detecting mutation signatures and the software (SigProfiler) while at the Sanger Institute and continues to build on this work with his team at the University of California, San Diego (UCSD). Together, NVIDIA and the Mutographs teams at UCSD and the Sanger Institute teamed up to use GPUs to accelerate this research.

“Research projects such as the Mutographs Grand Challenge are just that—grand challenges that push the boundary of what’s possible,” said Pete Clapham, leader of the Informatics Support Group at the Wellcome Sanger Institute. “NVIDIA DGX systems provide considerable acceleration that enables the Mutographs team to, not only meet the project’s computational demands, but to drive it even further, efficiently delivering previously impossible results.”

NVIDIA GPUs accelerate the scientific application by offloading the most time-consuming parts of the code. While the Sanger Institute saves cost and improves performance by running the computationally intensive work on GPUs, the rest of the application still runs on the CPU. From the researcher’s perspective, the overall application runs faster because it’s using the parallel processing power of the GPU to improve performance.

In the current project, researchers are studying DNA from the tumors of 5,000 patients with five cancer types: pancreas, kidney, colorectal, and two kinds of esophageal cancer. Five synthetic data matrices that mimic one type of real-world mutational profiles were used for estimating compute performance. An NVIDIA DGX-1 system runs the NMF algorithm against the five matrices, while the corresponding replicated CPU jobs are executed in docker containers on OpenStack virtual machines (VMs), specifically 60 cores in Intel Xeon Skylake Processors with 2.6 GHz and 697.3 GB of random-access memory (RAM).

The NVIDIA DGX-1 is an integrated system for AI featuring eight NVIDIA V100 Tensor Core GPUs that connect through NVIDIA NVLink, the NVIDIA high-performance GPU interconnect, in a hybrid cube-mesh network. Together with dual-socket Intel Xeon CPUs and four 100 Gb NVIDIA Mellanox® InfiniBand network interface cards, the DGX-1 delivers one petaFLOPS of AI power, for unprecedented training performance. The DGX-1 system software, powerful libraries, and NVLink network are tuned for scaling up deep learning across all eight V100 Tensor Core GPUs to provide a flexible, maximum performance platform for the development and deployment of AI applications in both production and research settings.

“Research projects such as the Mutographs Grand Challenge are just that—grand challenges that push the boundary of what’s possible. NVIDIA DGX systems provide considerable acceleration that enables the Mutographs team to, not only meet the project’s computational demands, but to drive it even further, efficiently delivering previously impossible results.”

Pete Clapham, Leader of the Informatics Support Group, Wellcome Sanger Institute

Faster Results and More Complex Experiments Hold the Promise to Improve Human Health

An average of 30X acceleration was observed when the pipeline jobs were executed on the DGX-1 platform compared to those on CPU hardware. The DGX-1 delivered accurate results in sixteen hours for an equivalent CPU job that usually took twenty days in a real-life analysis.

The speedup and compute power of GPUs are enabling researchers to obtain scientific results faster, run a greater number of experiments, and run more complex experiments than were previously possible, paving the way for scientific discoveries that could transform the future of cancer treatments.