Higher Education / Research
With support from the National Science Foundation’s National Artificial Intelligence Research Resource (NAIRR) pilot program program and access to NVIDIA DGX™ Cloud, The Walter Lab at Harvard Medical School completed nearly 1.7 million protein interaction predictions in just three months—a process that previously would have taken researchers a lifetime to complete.
Human DNA is constantly impacted by factors like UV light exposure and external chemicals in the environment. To prevent life-threatening conditions such as cancer, cells must repair DNA damage. Johannes Walter and his research team are working to understand the molecular mechanisms that cells use to copy and repair their DNA by harnessing the power of artificial intelligence.
“We’ve had so much success with AlphaFold in our area of biology that we’ve sought to expand it beyond the interest of our lab to the problem that almost all biologists think about, which is how proteins need to interact with each other throughout the human body,” said Ernst Schmid, a graduate student in The Walter Lab. “We're going from a few hundred proteins interacting in a few hundred ways to the scale of the entire proteome: 20,000 proteins interacting in perhaps a million ways.”
The first step in this process was developing a structural characterization, or map, of the protein-protein interactions (PPIs) within a single cell. Other AI tools are trained to predict 3D structures of proteins but often do not reliably separate relevant PPIs from false-positive predictions.
Additionally, manually determining the structure of a protein was an incredibly time-consuming process, which used to require years of research.
The Walter Lab conquered these limitations by developing a multi-step pipeline:
Initially, the researchers applied SPOC to a matrix of around 300 human genome maintenance proteins. This classifier was able to generate 40,000 protein interaction predictions, providing the foundation for the human structural interactome.
To dramatically scale this project, the team adopted NVIDIA Accelerated Computing infrastructure and the full NVIDIA DGX Cloud stack—consisting of 32-node DGX clusters with 256 A100 GPUs. This NVIDIA solution allowed researchers to run 1.7 million PPI predictions on ColabFold in the span of three months.
The NVIDIA DGX Cloud allocation was made possible through NVIDIA’s partnership with the National Science Foundation’s NAIRR pilot program. Through this initiative, researchers across the United States can use DGX Cloud to advance projects in areas such as pandemic prevention, RNA and human cells research, smart agriculture, and materials science.
“It is no exaggeration to say that this project would not have been possible without DGX Cloud,” Schmid said. “Having a dedicated cloud resource that we could completely configure for our own uses allowed us to achieve the speed and scale we needed to get through our data set and frankly, if we didn't have that kind of control, it would have literally taken years, if not more, to get this done.”
High-confidence PPIs discovered using this multilayer method can enable hypothesis generation in all areas of cell physiology. These results provide a framework for accurately interpreting large-scale computational protein interaction screens and help lay the foundation for faster and more complex interactomes.
The Walter Lab and scientists around the world will undertake new experiments and computational approaches to understand these biological complexities and will lead them closer to understanding the mechanisms of human cells to prevent prevalent diseases such as cancer.
Explore further how the NAIRR pilot, in partnership with NVIDIA, is providing the tools for innovative scientific research and discovery on a global scale.