CUDA Spotlight: GPU-Accelerated Real Science




This week we interviewed Dr. Jeffrey Vetter of Oak Ridge National Laboratory and Georgia Tech as part of the CUDA Spotlight Series.

NVIDIA: Jeff, what is the focus of your work?
Jeff: Over the last decade, I have been investigating hardware and software technologies that will most likely appear in future supercomputer systems, and which of those technologies best satisfy specific application workloads.

I have worked on a number of projects: IBM BlueGene/L, Cray X1, Cray XT, FPGAs, GPUs and other technologies. Our team’s early work has contributed to the design and deployment of the NSF Keeneland system and the DOE Titan system.

Not surprisingly, our work over the last several years has primarily focused on GPUs. Our team is involved in most every aspect of GPUs in computational science: future architectures, programming systems, applications development, and education and outreach.

For example, we are partners on NVIDIA’s Echelon research project, which is sponsored by the DARPA UHPC program with the goal of fitting one PetaFLOPS in one rack and using less than 57K watts of power. On Keeneland, we actively engage applications teams with the full spectrum of experience on GPUs – from experts to no experience whatsoever.

NVIDIA: Tell us about Keeneland.
Jeff: The Keeneland Project [1] is a five-year Track 2D cooperative agreement awarded by the National Science Foundation (NSF) in 2009 for the deployment of an innovative high performance computing system in order to bring emerging architectures to the open science community. Keeneland is based at the Georgia Institute of Technology (Georgia Tech); our partners are Oak Ridge National Lab, University of Tennessee-Knoxville, and the National Institute for Computational Sciences.

Together, we manage the facility (e.g., power, system administration, allocations), perform education and outreach activities for advanced architectures, develop and deploy software productivity tools for this class of architecture, and team with early adopters to map their applications to Keeneland architectures.

In 2010, the Keeneland project procured and deployed its initial delivery system (KIDS): a 201 teraflops, 120-node HP Proliant SL390 system with 240 Intel Xeon CPUs and 360 NVIDIA Tesla GPUs with the nodes connected by an InfiniBand QDR network. The KID system is being used to develop programming tools and libraries in order to ensure that users can productively accelerate important scientific and engineering applications. The system is also available to a select group of users to port and tune their codes to a scalable GPU-accelerated system.

In 2012, the Keeneland project will procure and deploy its full scale system, which will be available as a NSF XSEDE (Extreme Science and Engineering Discovery Environment) production resource.

NVIDIA: Who is utilizing Keeneland?
Jeff: Openness is a major benefit of Keeneland. Keeneland is a NSF resource, and as such, we are open to a diverse set of computational science projects, including computer science projects with the goal of enabling scalable heterogeneous computing with GPUs.

As of last week, Keeneland had approximately 75 projects and 200 users. Not surprisingly for a NSF supercomputer, most of the users are scientists from academia and other research organizations: Georgia Tech, University of Texas at Austin, University of Illinois at Urbana-Champaign, University of California, Stanford, Temple, Florida State, George Washington, Indiana, MIT, Purdue, Emory, NCAR, Utah and many others. We also partner with numerous industry vendors in order to ensure that tools work properly on Keeneland.

NVIDIA: What are some primary requirements of today’s researchers?
Jeff: Our users collectively have diverse workload requirements. This fact can make it very challenging to design a single supercomputer architecture to satisfy all these requirements.

For example, the computational molecular biologists have applications that are generally ‘strong scaling’ applications, and, hence, adding more processors to, or scaling up, their application will eventually have diminishing returns in terms of performance. In essence, these strong scaling applications need faster processors, which is why many of these scientists have turned to GPUs.

On the other hand, other applications, such as those in combustion or materials design, need better performance on more complex versions of their applications: adding new physics for higher resolution. These applications typically must balance the need for faster processors with the need for low latency and high bandwidth communication among processors (GPUs), and larger memory capacity.

NVIDIA: What are the key drivers in supercomputing today?
Jeff: From the facility or data center perspective, the key driver is the energy required for running large scale supercomputers. This is not just a prediction. Look at most contemporary supercomputing facilities: they are often limited by the amount of power that can be physically delivered to the computer in the building. Moreover, various studies by distinguished panels over the past three years have projected that Exascale systems could use hundreds of megawatts just to power the supercomputer if we simply scale up existing technologies. In our group, we are investigating new device technologies and heterogeneous computing as ways to improve energy efficiency of these systems.

On the other hand, applications developers are most concerned about programmability. The last significant transition for the scientific computing community was the transition in the 1990s from vector computing to distributed memory computing with MPI. In order to provide solutions to these questions, our team is investigating multiple fronts: CUDA, compiler directives, runtime libraries, frameworks and debugging and correctness tools. It is an exciting time to be in computer science!

NVIDIA:  How does CUDA fit into the modern computing landscape?
Jeff: CUDA is a phenomenon. In less than five years, the CUDA programming model has grown from its initial introduction to wide adoption. It is easy to forget how challenging it was to program GPUs prior to CUDA. These days, CUDA is so pervasive that many students get their first introduction to parallel programming and fine-grained parallelism with CUDA on their laptop GPU. This fact alone makes CUDA an important player in the modern computing landscape.

Bio: Dr. Jeffrey Vetter
Jeffrey Vetter, Ph.D., holds a joint appointment between Oak Ridge National Laboratory (ORNL) and the Georgia Institute of Technology (GT). At ORNL, Jeff is a Distinguished R&D Staff Member, and the founding group leader of the Future Technologies Group. At GT, he is a Joint Professor in the Computational Science and Engineering School of the College of Computing, the Principal Investigator for the NSF Track 2D Experimental Computing Facility, named Keeneland, for large scale heterogeneous computing using graphics processors, and the Director of the NVIDIA CUDA Center of Excellence. Jeff earned his Ph.D. in Computer Science from the Georgia Institute of Technology, and his research explores emerging hardware and software technologies for HPC.

Relevant links
http://keeneland.gatech.edu
http://ft.ornl.gov
http://ft.ornl.gov/~vetter
http://research.nvidia.com/content/gatech-ccoe-summary

References
[1] J.S. Vetter, R. Glassbrook et al., “Keeneland: Bringing heterogeneous GPU computing to the computational science community,” IEEE Computing in Science and Engineering, 13(5):90-5, 2011.
http://dx.doi.org/10.1109/MCSE.2011.83

Editor’s note
Read the press release about the Titan supercomputer project recently announced by Oak Ridge National Laboratory and NVIDIA.