Events

Subscribe

SC14 | New Orleans | LA November 16-21, 2014

GPU TECHNOLOGY THEATER AT SC14

sponsored-by-hp-v1.jpg

MONDAY, NOVEMBER 17 - THURSDAY, NOVEMBER 20
DURING EXHIBITION HOURS | NVIDIA BOOTH #1727

Come hear how others in your field talk about a wide range of topics in HPC and accelerated computing. The theater will host talks every 30 minutes, every day from an amazing lineup of industry luminaries and scientific experts. This year's amazing lineup thus far includes an ACM Gordon Bell Finalist, the winner of Sidney Fernbach Memorial Award, and more.

The theater is open to all attendees and seating is first come, first serve. We recommend you come early to reserve your seat.

Speakers and times subject to change.

 

MONDAY, NOVEMBER 17 | BOOTH #1727

19:30 - 20:00

GPU Acceleration: What's Next

 

 


 

Ian Buck (General Manager, Accelerated Computing, NVIDIA)

Ian Buck

Ian Buck is NVIDIA's General Manager for GPU Computing Software, responsible for all engineering, 3rd party enablement, and developer marketing activities for GPU Computing at NVIDIA. Ian joined NVIDIA in 2004 and created CUDA, which remains the established leading platform for accelerated based parallel computing. Before joining NVIDIA, Ian was the development lead on Brook which was the forerunner to generalized computing on GPUs. He holds a Ph.D. in Computer Science from Stanford University and B.S.E from Princeton University.

Jeff Nichols (Associate Laboratory Director, Computing and Computational Sciences, Oak Ridge National Laboratory)

Rob Neely (Associate Division Leader, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory)

 

TUESDAY, NOVEMBER 18 | BOOTH #1727

10:30 - 10:50

Extending the Reach of Parallel Computing with CUDA

 

CUDA, NVIDIA's parallel computing platform and programming model, is extending its reach. CUDA support for GPU computing is expanding to support systems based on x86, ARM64, and POWER CPUs, providing a choice of high-performance computing platforms. Programmers can program GPUs natively in the most popular programming languages: C, C++, Fortran, Python, and Java. New CUDA software features like Unified Memory, drop-in libraries, and powerful developer tools make high-performance GPU computing with CUDA easier than ever before. And NVIDIA's future GPU architectures and NVLink interconnect will provide unprecedented efficiency for heterogeneous computing. In this talk you'll learn about the latest developments in the NVIDIA CUDA computing platform and ecosystem, and get insight into the philosophy driving the development of CUDA.


 

Mark Harris (CTO GPU Computing, NVIDIA)

Mark Harris

Mark Harris is Chief Technologist for GPU Computing at NVIDIA, where he works as a developer advocate and helps drive NVIDIA's GPU computing software strategy. His research interests include parallel computing, general-purpose computation on GPUs, physically based simulation, and real-time rendering. Mark founded www.GPGPU.org while he was earning his PhD in computer science from the University of North Carolina at Chapel Hill. Mark brews his own beer and cures his own bacon in Brisbane, Australia, where he lives with his wife and daughter.

11:00 - 11:20

Accelerating ORNL's Applications to the Exascale

 

The Titan computer at Oak Ridge National Laboratory is delivering exceptional results for our scientific users in the U.S. Department of Energy's Office of Science, Applied Energy programs, academia, and industry. Mr. Bland will describe the Titan system, how this system fits within the roadmap to exascale machines, and describe successes we have had with our applications using GPU accelerators.


 

Buddy Bland (Project Director, ORNL)

Buddy Bland

Buddy Bland is the project director of the Oak Ridge Leadership Computing Facility, home to Titan and Jaguar, both #1 systems on the Top500 list of supercomputers. He has worked in HPC at ORNL for 30 years on machines as varied as the Cray 1, Intel Paragon, IBM SP, and Cray XT.

11:30 - 11:50

Convergence of Extreme Big Data with HPC: How GPUs Will be Significant

 

With rapidly increasing data volume and processing requirements of so-called big data, conventional cloud infrastructures will no longer efficient, and there will be an imminent convergence of big data in the extreme and HPC as a result. The question is, will GPUs play a central role, or will they be peripheral? Our new JST-CREST "Extreme Big Data" is a 5-year project that aims to address this question, and early results on our TSUBAME supercomputers do look promising, including our TSUBAME-KFC becoming world #1 on the "Green Graph 500" big data benchmark in Nov. 2013. At the same time, however, we are finding that it will be essential to eliminate bottlenecks in future GPU architectures.


 

Satoshi Matsuoka (Professor, Tokyo Institute of Technology)

Satoshi Matsuoka

Satoshi Matsuoka, a professor at Tokyo Institute of Technology, has led many innovative projects such as the TSUBAME GPU-enabled supercomputers. At SC14, he is being awarded the prestigious IEEE-CS Fernbach Award "for his work on software systems for high-performance computing on advanced infrastructural platforms, large-scale supercomputers, and heterogeneous GPU/CPU supercomputers".

12:00 - 12:20

GPU-Accelerated Analysis of Large Biomolecular Complexes

 

State-of-the-art GPU-accelerated petascale computers enable molecular dynamics simulations of large viruses, light harvesting organelles, and synthetic nanodevices. Petascale molecular dynamics simulations also pose formidable computational challenges in preparing, analyzing, and visualizing the structures and their dynamics. GPUs provide massive parallelism and high bandwidth memory systems that are ideally suited to these tasks. This presentation will highlight the use of GPU ray tracing for visualizing the process of photosynthesis, and GPU accelerated analysis of results of hybrid structure determination methods that combine data from cryo-electron microscopy and X-ray crystallography atom molecular dynamics with all- simulations.


 

John Stone (Senior Research Programmer, University of Illinois at Urbana-Champaign)

John Stone

John Stone is a Senior Research Programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology, and Associate Director of the NVIDIA CUDA Center of Excellence at the University of Illinois. Mr. Stone is the lead developer of VMD, a high performance molecular visualization tool used by researchers all over the world. His research interests include molecular visualization, GPU computing, parallel processing, ray tracing, haptics, and virtual environments. Mr. Stone was awarded as an NVIDIA CUDA Fellow in 2010. Mr. Stone also provides consulting services for projects involving computer graphics, GPU computing, and high performance computing. Prior to joining University of Illinois in 1998, Mr. Stone helped develop the award winning MPEG Power Professional line of video compression tools at Heuris.

13:00 - 13:20

Heterogeneous HPC, Architectural Optimization and NVLink

 

This talk will explore heterogeneous node design and architecture and how NVLink, a new scalable node integration bus, enables uncompromising performance on the most demanding applications.


 

Steve Oberlin (CTO, Accelerated Computing, NVIDIA)

Steve Oberlin

Steve Oberlin is the CTO for Accelerated Computing at NVIDIA. His 30+ years in HPC include leadership, architecture, and design roles on the Cray-2 and Cray-3 vector supercomputers, and as chief architect of the massively parallel T3D and T3E. He joined NVIDIA in 2013.

13:30 - 13:50

A Livermore Perspective on Next-Generation Computing

 

High performance computing (HPC) has been vital and ubiquitous at LLNL since 1952. For every mission at LLNL, advanced computing in simulation or data science is a key element in the arsenal of scientific, theoretical and experimental capabilities. The computing world is entering a new architectural period based on extreme parallelism, heterogeneity, and deeply hierarchical memory. Dealing with this challenge effectively is at the heart of a successful strategy for LLNL.


 

Terri Quinn (Principal Deputy Department Head, Integrated Computing and Communications, Lawrence Livermore National Laboratory)

Terri Quinn

Terri is responsible for an organization consisting of three divisions with over 400 technical staff working in high-performance computing, computer security, and enterprise computing. Livermore Computing (LC), LLNL's high performance computing organization, operates some of the most advanced production classified and unclassified computing environments. Within LC's five computing facilities are housed over 27 Petaflops of computing resources serving 2,900 users both on-site and off-site. She represents LLNL on DOE's Exascale Executive team, a collaboration of seven labs working to define and promote a joint NNSA/SC Exascale program for DOE, and she is on the Board of OpenSFS, Open Scalable File Systems, Inc. OpenSFS is a non-profit company dedicated to supporting high-end open-source file systems.

13:30 - 13:50

A Decade of GPU Impact at the National Center for Supercomputing Applications (NCSA)

 

For more than a decade, the National Center of Supercomputing Applications at the University of Illinois Urbana-Champaign has been using GPUs to help scientists and engineers achieve increases in productivity for their investigations. First, the NCSA Innovative System Laboratory assessed the potential impact of early GPUs on science applications, working on both system, data analysis and simulation application software. Today, NVIDIA GPUs are a key component in the Blue Waters system, one of the world's most powerful supercomputers, which is enabling unprecedented scientific insights. This talk, will trace the evolution of the GPUs usage at NCSA - from single GPUs with prototype applications to today's use of 1,000s of GPUs in a single, full production simulations program. It will also share some of the outstanding science results enabled in part by the GPUs in Blue Waters.


 

Ed Seidel (Director, NCSA)

Ed Siedel

NCSA director Edward Seidel is a distinguished researcher in high-performance computing and relativity and astrophysics with an outstanding track record as an administrator. In addition to leading NCSA, he is also a Founder Professor in the University of Illinois Department of Physics and a professor in the Department of Astronomy. Seidel is a fellow of the American Physical Society and of the American Association for the Advancement of Science, as well as a member of the Institute of Electrical and Electronics Engineers and the Society for Industrial and Applied Mathematics. His research has been recognized by a number of awards, including the 2006 IEEE Sidney Fernbach Award, the 2001 Gordon Bell Prize, and 1998 Heinz-Billing-Award. He earned a master's degree in physics at the University of Pennsylvania in 1983 and a doctorate in relativistic astrophysics at Yale University in 1988.

14:00 – 14:20

Machine Learning: What Computational Researchers Need to Know

 

NVIDIA GPUs are powering a revolution in machine learning. With the rise of deep learning algorithms, in particular deep convolutional neural networks, computers are learning to see, hear, and understand the world around us in ways never before possible. Image recognition and detection systems are getting close to human-level performance. I will explain what deep learning is, and how deep learning enables scientists to automatically process image data. This talk will show how computational scientists from various domains and practitioners can apply deep learning to solve their own research problems. NVIDIA's recently released cuDNN library that supports deep learning efforts on GPUs will be introduced to demonstrate how these algorithms map efficiently to CUDA.


 

Jonathan Cohen (Director, CUDA Libraries and Software Solutions)

Jonathan Cohen

Jonathan Cohen leads CUDA Libraries and Algorithms group, as part of the CUDA product team. In this role, he oversees development of the CUDA Platform libraries as well as GPU-acceleration software technology targeted to specific application domains such as bioinformatics and machine learning. Before moving to the product side of NVIDIA, Mr. Cohen spent three years as a senior research scientist with NVIDIA Research developing scientific computing and real-time physical simulation algorithms for NVIDIA's massively parallel GPUs. Prior to joining NVIDIA, Mr. Cohen worked in the Hollywood feature film visual effects industry, where he developed custom software for over a dozen films. He led development of the sand simulation and rendering system for "Spider-Man 3" as a Computer Graphics Supervisor at Sony Pictures Imageworks. Mr. Cohen was awarded an Academy Award (Technical Achievement Award) in 2007 from the Academy of Motion Pictures Arts and Sciences for his work on fluid simulation and volumetric modeling for visual effects. He received an undergraduate degree from Brown in Mathematics and Computer Science.

14:30 – 14:50

Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks

 

We leverage GPU accelerated deep convolutional neural networks to detect mitosis in breast cancer histology images. The networks are trained to classify whether each pixel in the image belongs to a mitosis or not, using as context a patch centered on the pixel. Simple post-processing is then applied to the network output. Our approach won both the ICPR 2012 mitosis detection competition and the MICCAI 2013 Grand Challenge on Mitosis Detection, outperforming other contestants by a significant margin, with accuracy comparable with the accuracy of professional histologists. GPUs allow us to train each DNN in only one day. Scanning one 4-MPixel image for detecting mitosis takes less than one second, making the method already useful for practical applications.


 

Dan Ciresan (Senior Researcher, IDSIA)

Dan Ciresan

Dr. Dan Ciresan received his PhD from "Politehnica" University of Timisoara, Romania. He first worked as a postdoc before becoming a senior researcher at IDSIA, Switzerland. Dr. Ciresan is one of the pioneers of using CUDA for Deep Neural Networks (DNNs). His methods have won five international competitions on topics such as classifying traffic signs, recognizing handwritten Chinese characters, segmenting neuronal membranes in electron microscopy images, and detecting mitosis in breast cancer histology images. Dr. Ciresan has published his results in top-ranked conference proceedings and journals. His DNNs have significantly improved the state of the art on several image classification tasks.

15:00 - 15:20

Professional Graphics Meets Supercomputing

 

The line between computation and visualization continues to blur, and the scientific community has increasing common ground with graphical applications such as product design, computer games, and file production. We will highlight technologies from NVIDIA and partners that show how Supercomputing techniques are used in automotive design, and how those same technologies are used to visualize some of the world's best science. We will also show techniques formerly relegated to scientific research that are now used in mainstream gaming applications.


 

Steven Parker (VP & CTO of the Professional Graphics, NVIDIA)

Steven Parker

Dr. Steven Parker is VP & CTO of the Professional Graphics business at NVIDIA where he oversees many of NVIDIA's professional graphics solutions including ray tracing, scientific visualization and graphics hardware products. Dr. Parker joined NVIDIA in 2008 and has a rich background in high-performance computing and computer graphics.

15:30 - 15:50

Designing the Future: How Successful Co-design Helps Shape Hardware and Software Development

 

In working to improve future hardware and software for its simulation requirements, Sandia National Laboratories are engaging in co-design efforts with major hardware vendors. In this talk recent improvements influenced by the collaboration with NVIDIA will be discussed. The presentation will in particular focus on newly available experimental C++11 support in CUDA and how this facilitates both more rapid porting of applications to GPUs as well as better exploitation of GPU architecture characteristics. Furthermore, initial performance studies on NVIDIA's next generation Tesla product line will be presented as well as first impressions of an IBM POWER-8 based GPU cluster.


 

Christian Trott (Postdoctoral Appointee, Sandia National Labs)

Christian Trott

Christian Trott is a high performance computing expert with experience in designing and implementing software for GPU and MIC compute-clusters.He earned a Dr. rer. nat. from the University of Technology Ilmenau in theoretical physics. Prior scientific work focused on computational material research using Ab-Initio calculations, molecular dynamic simulations and monte carlo methods for investigations of ion-conducting glass materials. As of 2012 Christian is a postdoctoral appointee at the Sandia National Laboratories and is working on developing scientific codes for future manycore architectures.

16:00 – 16:20

NVIDIA's Path to ExaScale

 

Future computers of all sizes from cell phones to supercomputers share challenges of power and programmability to realize their potential. The end of Dennard scaling has made all computing power limited, so that performance is determined by energy efficiency. With improvements in process technology offering little increase in efficiency innovations in architecture and circuits are required to maintain the expected performance scaling. The large scale parallelism and deep storage hierarchy of future machines poses programming challenges. Future programming systems must allow the programmer to express their code in a high-level, target-independent manner and optimize the target-dependent decisions of mapping available parallelism in time and space. This talk will discuss these challenges in more detail and introduce some of the technologies being developed to address them.


 

Bill Dally (Chief Scientist and SVP of Research, NVIDIA)

Bill Dally

Bill Dally is chief scientist at NVIDIA and senior vice president of NVIDIA Research, the company's world-class research organization, which is chartered with developing the strategic technologies that will help drive the company's future growth and success. Dally first joined NVIDIA in 2009 after spending 12 years at Stanford University, where he was chairman of the computer science department and the Willard R. and Inez Kerr Bell Professor of Engineering. Dally and his Stanford team developed the system architecture, network architecture, signaling, routing and synchronization technology that is found in most large parallel computers today. Dally was previously at the Massachusetts Institute of Technology from 1986 to 1997, where he and his team built the J-Machine and M-Machine, experimental parallel computer systems that pioneered the separation of mechanism from programming models and demonstrated very low overhead synchronization and communication mechanisms. From 1983 to 1986, he was at the California Institute of Technology (Caltech), where he designed the MOSSIM Simulation Engine and the Torus Routing chip, which pioneered wormhole routing and virtual-channel flow control. Dally received a bachelor's degree in electrical engineering from Virginia Tech, a master's degree in electrical engineering from Stanford University and a PhD in computer science from Caltech.

16:30 - 16:50

Embedded High Performance Computing and Neuromorphic Computing

 

The Air Force Research Laboratory Information Directorate Advanced Computing Division (AFRL/RIT) has focused research efforts that explore how to increase the computational capabilities near the Air Force's tactical edge. Our High Performance Computing Affiliated Resource Center (HPC-ARC) we have designed and built a large scale interactive computing clusters. Applications developed and running on heterogeneous HPC clusters (CPU-GPU systems) include: neromorphic computing applications, video synthetic aperture radar (SAR) backprojection and Autonomous Sensing Framework (ASF). This presentation will show progress on performance optimization using embedded architectures and how neuromorphic applications are advancing the capabilities for the autonomous systems within the Air Force.


 

Mark Barnell (Senior Computer Scientist, U.S. Air Force Research Laboratory)

Mark Barnell

Mark Barnell is a Computer Scientist with the U.S. Air Force Research Laboratory. Mr. Barnell has over 27 years of experience in computer engineering and advanced computing. He currently is the HPC Director for AFRL's Information Directorate Advanced Computing Division Affiliated Resource Center (ARC) facility. His area includes high performance computers, persistent wide area surveillance, distributed and next generation architectures.

17:00 - 17:20

A CUDA Implementation of the High Performance Conjugate Gradient Benchmark

 

HPCG was recently proposed as a complement to the High Performance Linpack (HPL) benchmark currently used to rank supercomputers in the Top500 list. HPCG solves a large sparse linear system of equations using a multigrid preconditioned conjugate gradient algorithm. This talk will present the details of a CUDA implementation of the HPCG benchmark, including results on a wide range of GPU systems: from the smallest CUDA capable platform - the Jetson TK1, to the largest GPU supercomputers - Titan (Cray XK7 at ORNL) and Piz Daint (Cray XC30 at CSCS).


 

Everett Phillips (Applied Engineer, NVIDIA)

Everett Phillips

Everett Phillips works in NVIDIA's tesla performance group on High performance computing applications. He holds an MS in mechanical and Aeronautical Engineering from UC Davis.

17:30 - 17:50

Piz Daint and Piz Dora: A Productive, Heterogeneous Supercomputing Infrastructure for Science

 

The Cray XC30 system at CSCS, which includes "Piz Daint", the most energy efficient peta-scale supercomputer in operation today, has been extended with additional multi-core CPU cabinets (aka "Piz Dora"). In this heterogeneous system we unify a variety for high-end computing services – extreme scale compute, data analytics, pre- and post processing, as well as visualization – that are all important parts for the scientific workflow. Besides reviewing the successes for "Piz Daint" I will discuss how integration of multiple services into one platform shows promise to enhance productivity of our users.


 

Thomas C. Schulthess (Director, CSCS)

Thomas C. Schulthess

Thomas Schulthess received his Ph.D. in physics from ETH Zurich in 1994. He is a professor for computational physics at ETH Zurich and Director of the Swiss National Supercomputing Center in Lugano, Switzerland. Thomas holds a visiting distinguished professor appointment at ORNL, where he was group leader and researcher in computational materials science for over a decade before moving to ETH Zurich in 2008. His current research interests are in development of efficient and scalable algorithms for the study of strongly correlated quantum systems, as well as electronic structure methods in general. He is also engaged in the development of efficient tools and simulations systems for other domain areas, such as meteorology/climate and geophysics.

 

WEDNESDAY, NOVEMBER 19 | BOOTH #1727

10:30 - 10:50

Accelerating Computational Science and Engineering with Heterogeneous Computing in Louisiana

 

Modeling and simulation with high performance computing (HPC) has accelerated the process of innovation and science discovery. In this presentation, I will present an overview of cyberinfrastructure at Louisiana State University (LSU) and Louisiana Optical Network Initiative (LONI). I will discuss what Louisiana has been doing in supercomputing, why heterogenous computing is important, and how we have prepared applications to move from conventional CPU architectures to a hybrid, accelerated architecture environment. I will be sharing some early results from scientific codes running on GPU-accelerated HPC clusters.


 

Dr. Honggao Liu (Deputy Director of Center for Computation & Technology (CCT), Louisiana State University (LSU))

Dr. Honggao Liu

Dr. Honggao Liu is the Deputy Director of Center for Computation & Technology (CCT) at Louisiana State University (LSU). Liu has been working at LSU for over 17 years. He got his Ph.D in Chemical Engineering from LSU in 2002 and also had a B.S. in Chemical Engineering from Xi'an Jiaotong University and two M.S. degrees in Chemical Engineering from Tianjin University and LSU. He has spent the past twelve years working in High Performance Computing (HPC). Dr. Liu is the Principal Investigator (PI) of CUDA Research Center selected by NVIDIA in 2012. Liu has overseen all HPC activities and led the HPC development efforts at LSU and Louisiana Optical Network Initiative (LONI), and has been instrumental in establishing HPC at LSU as a nationally recognized facility for providing HPC services and production cycles to researchers world-wide.

11:00 - 11:20

MAGMA: Matrix Numerical Library for GPU and Multicore Architectures

 

The MAGMA research is based on the idea that, to address the complex challenges of the emerging hybrid environments, optimal software solutions will themselves have to hybridize, combining the strengths of different algorithms within a single framework. Building on this idea, we aim to design linear algebra algorithms and frameworks for hybrid manycore and GPU systems that can enable applications to fully exploit the power that each of the hybrid components offers.


 

Jack Dongarra (Professor, Innovative Computing Laboratory, EECS Department, University of Tennessee)

Jack Dongarra

Jack Dongarra received a Bachelor of Science in Mathematics from Chicago State University in 1972 and a Master of Science in Computer Science from the Illinois Institute of Technology in 1973. He received his Ph.D. in Applied Mathematics from the University of New Mexico in 1980. He worked at the Argonne National Laboratory until 1989, becoming a senior scientist. He now holds an appointment as University Distinguished Professor of Computer Science in the Computer Science Department at the University of Tennessee and holds the title of Distinguished Research Staff in the Computer Science and Mathematics Division at Oak Ridge National Laboratory (ORNL), Turing Fellow at Manchester University, and an Adjunct Professor in the Computer Science Department at Rice University. He is the director of the Innovative Computing Laboratory at the University of Tennessee. He is also the director of the Center for Information Technology Research at the University of Tennessee which coordinates and facilitates IT research efforts at the University.

11:30 - 11:50

Supporting Research Initiatives with NVIDIA GPUs

 

There are many different research groups throughout Microsoft Research who make use of GPU technologies in various ways. This seminar gives a brief overview of these research areas along with concrete examples showing how NVIDIA GPUs have made a difference.


 

Mark Staveley (Senior Research Program Manager, Grand Central Data, Microsoft)

Mark Staveley

Mark Staveley is a Senior Research Program Manager with Microsoft Research. In Mark's current role, he oversees MSR's world-wide data management, curation, and processing program. Before coming to MSR, Mark worked as Software Engineer on Xbox One and on the Windows HPC team. Mark holds a PhD from Memorial University, a MSc from the University of Waikato, and a BSc from Queen's University.

12:00 - 12:20

Heterogeneous HPC, Architectural Optimization, and NVLink

 

This talk will explore heterogeneous node design and architecture and how NVLink, a new scalable node integration bus, enables uncompromising performance on the most demanding applications.


 

Steve Oberlin (CTO, Accelerated Computing, NVIDIA)

Steve Oberlin

Steve Oberlin is the CTO for Accelerated Computing at NVIDIA. His 30+ years in HPC include leadership, architecture, and design roles on the Cray-2 and Cray-3 vector supercomputers, and as chief architect of the massively parallel T3D and T3E. He joined NVIDIA in 2013.

12:30 – 12:50

Reactive Runtime Systems for Heterogeneous Exascale Computation

 

The challenges to achieving effective Exascale computing are daunting even as the outlines of the first such machines are visible on the horizon. Multiple paths to programming and managing computers incorporating billion-way parallelism are being pursued by a host of projects in industry, academia, and national laboratories as well as by a strong international community. Many such approaches are investigating the structure, control, and interoperability of dynamic runtime systems to manage resources and perform task scheduling. Heterogeneity including the power of the GPU accelerators facilitates performance and energy efficiency to a significant degree but also demands an additional level of program scalable systems. This talk presents the recent directions in runtime systems that are emerging to benefit from scalable heterogeneous systems while mitigating the burden on the application programmers.


 

Thomas Sterling (Professor, Informatics and Computing, Indiana University School of Informatics and Computing; Chief Scientist and Executive Associate Director, Center for Research, Extreme Scale Technologies (CREST).)

Thomas Sterling

Dr. Thomas Sterling holds the position of Professor of Informatics and Computing at the Indiana University (IU) School of Informatics and Computing as well as serves as Chief Scientist and Executive Associate Director of the Center for Research in Extreme Scale Technologies (CREST). Since receiving his Ph.D from MIT in 1984 as a Hertz Fellow Dr. Sterling has engaged in applied research in fields associated with parallel computing system structures, semantics, and operation in industry, government labs, and academia. Dr. Sterling is best known as the "father of Beowulf" for his pioneering research in commodity/Linux cluster computing. He was awarded the Gordon Bell Prize in 1997 with his collaborators for this work. He was the PI of the HTMT Project sponsored by NSF, DARPA, NSA, and NASA to explore advanced technologies and their implication for high-end system architectures. Other research projects included the DARPA DIVA PIM architecture project with USC-ISI, the Cray Cascade Petaflops architecture project sponsored by the DARPA HPCS Program, and the Gilgamesh high-density computing project at NASA JPL. Thomas Sterling is currently engaged in research associated with the innovative ParalleX execution model for extreme scale computing to establish the foundation principles to guide the co-design for the development of future generation Exascale computing systems by the end of this decade. ParalleX is currently the conceptual centerpiece of the XPRESS project as part of the DOE X-stack program and has been demonstrated in proof-of-concept in the HPX runtime system software. Dr. Sterling is the co-author of six books and holds six patents. He was the recipient of the 2013 Vanguard Award.

13:00 – 13:20

Deep Learning on GPU Clusters

 

Deep neural networks have recently emerged as an important tool for difficult AI problems, and have found success in many fields ranging from computer vision to speech recognition. Training deep neural networks is computationally intensive, and so practical application of these networks requires careful attention to parallelism. GPUs have been instrumental in the success of deep neural networks, because they significantly reduce the cost of network training, which then has allowed many researchers to train better networks. In this talk, I will discuss how we were able to duplicate results from a 1000 node cluster using only 3 nodes, each with 4 GPUs.


 

Bryan Catanzaro (Senior Researcher, Baidu)

Bryan Catanzaro

Bryan Catanzaro is a Senior Researcher at the newly formed Silicon Valley AI Lab of Baidu Research, where he uses clusters of GPUs to train large deep neural networks. Prior to joining Baidu, Bryan was at NVIDIA Research, where he researched tools and methodologies to make it easier to use parallel computing for machine learning. He received his PhD from UC Berkeley.

13:30 - 13:50

Professional Graphics Meets Supercomputing

 

The line between computation and visualization continues to blur, and the scientific community has increasing common ground with graphical applications such as product design, computer games, and file production. We will highlight technologies from NVIDIA and partners that show how Supercomputing techniques are used in automotive design, and how those same technologies are used to visualize some of the world's best science. We will also show techniques formerly relegated to scientific research that are now used in mainstream gaming applications.


 

Steven Parker (VP & CTO of the Professional Graphics, NVIDIA)

Steven Parker

Dr. Steven Parker is VP & CTO of the Professional Graphics business at NVIDIA where he oversees many of NVIDIA's professional graphics solutions including ray tracing, scientific visualization and graphics hardware products. Dr. Parker joined NVIDIA in 2008 and has a rich background in high-performance computing and computer graphics.

14:00 - 14:20

Porting GE's Turbomachinery CFD Solver to the Cray XK7

 

GE's in-house CFD solver, "TACOMA", has been ported to take advantage of the NVIDIA GPUs on the Cray XK7 architecture to enable the application to run on Titan at the Oak Ridge National Laboratory. The port is "tri-hybrid" utilizing MPI for inter-node communication, OpenMP for intra-node parallelism, and OpenACC to execute computational kernels on the GPU. To accomplish the port, the team had to address strategies to resolve race conditions, establish procedures to extend the existing OpenMP programming paradigm to also cover OpenACC, establish mechanisms to manage and minimize data motion, and of course optimize overall performance. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.


 

Brian Mitchell (Senior Principal Engineer, GE Global Research)

Brian Mitchell

Brian Mitchell is responsible for the development of CFD software used in the design of turbomachinery airfoils. His technical interests include aero-mechanics, turbomachinery aerodynamics, parallel scalability, and code performance. Brian holds a PhD in Mechanical Engineering from Stanford University and has been with GE for nearly 20 years.

14:30 - 14:50

Machine Learning: What Computational Researchers Need to Know

 

NVIDIA GPUs are powering a revolution in machine learning. With the rise of deep learning algorithms, in particular deep convolutional neural networks, computers are learning to see, hear, and understand the world around us in ways never before possible. Image recognition and detection systems are getting close to human-level performance. I will explain what deep learning is, and how deep learning enables scientists to automatically process image data. This talk will show how computational scientists from various domains and practitioners can apply deep learning to solve their own research problems. NVIDIA's recently released cuDNN library that supports deep learning efforts on GPUs will be introduced to demonstrate how these algorithms map efficiently to CUDA.


 

Jonathan Cohen (Director, CUDA Libraries and Software Solutions)

Jonathan Cohen

Jonathan Cohen leads CUDA Libraries and Algorithms group, as part of the CUDA product team. In this role, he oversees development of the CUDA Platform libraries as well as GPU-acceleration software technology targeted to specific application domains such as bioinformatics and machine learning. Before moving to the product side of NVIDIA, Mr. Cohen spent three years as a senior research scientist with NVIDIA Research developing scientific computing and real-time physical simulation algorithms for NVIDIA's massively parallel GPUs. Prior to joining NVIDIA, Mr. Cohen worked in the Hollywood feature film visual effects industry, where he developed custom software for over a dozen films. He led development of the sand simulation and rendering system for "Spider-Man 3" as a Computer Graphics Supervisor at Sony Pictures Imageworks. Mr. Cohen was awarded an Academy Award (Technical Achievement Award) in 2007 from the Academy of Motion Pictures Arts and Sciences for his work on fluid simulation and volumetric modeling for visual effects. He received an undergraduate degree from Brown in Mathematics and Computer Science.

15:00 - 15:20

VTK-M: Uniting GPU Acceleration Successes in Large-Scale HPC Visualization

 

Designed by the developers of the EAVL, DAX, and PISTON visualization libraries, VTK-M seeks to be the reference accelerator based visualization library for the next generation of HPC systems. Learn both the success stories of each project and how they have influenced the design of VTK-M.


 

Robert Maynard (Developer, Kitwware)

Robert Maynard

Robert Maynard joined Kitware in March 2010 as a Research and Development Engineer. He is one of the primary developers of DAX and VTK-M and also an active contributor to CMake, ParaView, and VTK.

15:30 - 15:50

Large-scale Granular and Fluid (DEM/SPH) Simulations using Particles

 

Billions of particles are required to describe real granular phenomena and fluid dynamics by using particle simulations. We implement large-scale simulations based on DEM (Discrete Element Method) and SPH (Smoothed Particle Hydrodynamics), which computes only the interactions among near particles. Multiple GPUs on TSUBAME 2.5 in Tokyo Tech are used to boost the simulation and are assigned to each computational domain decomposed in space. Since the particle distribution changes in time and space, dynamic load balance is required for these simulations. We show several novel techniques to realize the simulations and a showcase of many applications.


 

Takayuki Aoki (Professor and Deputy Director, Global Scientific Information and Computing Center)

Takayuki Aoki

Takayuki Aoki received a BSc in Applied Physics, an MSc in Energy Science and Dr.Sci (1989) from Tokyo Institute of Technology, has been a professor in Tokyo Institute of Technology since 2001 and the deputy director of the Global Scientific Information and Computing Center since 2009. He received the Minister award of the Ministry of Education, Culture, Sports, Science & Technology in Japan and many awards and honors in GPU computing, scientific visualization, and others. He was the leader of the team of the Gordon Bell Prize in 2011 and also recognized as a CUDA fellow by NVIDIA in 2012.

16:00 - 16:20

24.77 PFLOPS Simulation on the Milky Way Galaxy Using a Gravitational Tree-Code on Titan's 18600

 

For our Gordon Bell Prize submission we simulated the Milky Way Galaxy using up to 242 billion particles. With this, the number of stars in our simulation is comparable to the number of stars in the Galaxy. For this work we used two supercomputers: Piz Daint at the Swiss National Supercomputing Centre and Titan at Oak Ridge National Laboratory. On the latter machine the simulation achieves a sustained performance of 24.77 PFlops on 18600GPUs. During the simulation our galaxy model forms a bar and spiral structure which comparable to the structure observed in the Milky Way. This similarity and the large amount of particles allows us to compare the results of the simulation directly with the billions of stars for which the billion dollar Gaia satellite will deliver astrometric parameters. In this talk we give a short overview of the algorithms used to achieve the measured performance and review what this simulation will be able to tell us about the Milky Way galaxy.


 

Simon Portegies Zwart (Professor, Computational Astrophysics, Sterrewacht Leiden in the Netherlands)

Simon Portegies Zwart

Simon Portegies Zwart is professor in computational astrophysics at the Sterrewacht Leiden in the Netherlands. His principal scientific interests are high-performance computational astrophysics. This includes parallel algorithms and numerical integration techniques, but also multi-scale and multi-physics modelling, the evolution of hierarchical stellar and planetary systems, and the ecology of dense star clusters. He is editor in chief of the open access journal "Computational Astrophysics and Cosmology" and visiting professor of the Particle Simulator Team of RIKEN. In his free time he brews meer and translates Egyptian hieroglyps.

16:30 - 16:50

Emerging Programming Approaches for GPU/CPU Heterogeneous Systems

 

As GPU capabilities continue to evolve, an emerging approach to programming applications on heterogeneous GPU / CPU systems is to treat the GPU as a Peer Processor to the main CPU Processor. When one considers a GPU as a peer processor, multiple code factoring arrangements are possible, and application designers end up with a variety of options to maximize application performance while maintaining portability of applications across both homogeneous and heterogeneous processor architectures. This talk will discuss these emerging peer processing approaches and will present methods to implement peer processing in CUDA, in OpenACC and in OpenMP 4.0.


 

James Sexton (Program Director, Computational Science Center; Senior Manager, IBM T.J. Watson Research Center)

James Sexton

Dr. Sexton is Program Director for the Computational Science Center at IBM's T. J. Watson Research Center where he works on IBM's Deep Computing Team. He received his Ph.D. in Theoretical Physics from Columbia University. He has held research positions at Fermilab in Illinois, Princeton's Institute for Advanced Study, and Hitachi's Central Research Laboratory in Tokyo. Prior to joining IBM in 2005, Dr. Sexton was a professor in the School of Mathematics in Trinity College Dublin and Director of the Trinity College Center for High Performance Computing. Dr. Sexton's interests are in theoretical computational physics, and applications and systems for high performance computing for research, industrial and commercial use.

17:00 - 17:20

Best Practices for Designing, Deploying, and Managing GPU Clusters

 

Modern HPC systems provide an wide array of options for integrating NVIDIA Tesla GPUs. Dale Southard, Principal System Architect, Office of the CTO for Tesla Computing, will provide an overview to help decision makers and system administrators in making best hardware and software selections and configurations.


 

Dale Southard (Senior Solution Architect, NVIDIA)

Dale Southard

Dale is a Senior Solution Architect with NVIDIA covering extreme scale HPC.

17:30 - 17:50

Efficient Large-Scale Content-Based Multimedia Event Detection

 

We have developed a content-based Multimedia Event Detection system which has performed indexing and search on more than 10,000 hours (2.5 Terabytes) of video data. Our system is not only streamlined for speed, but also achieved the best performance in the highly competitive TRECVID 2014 Multimedia Event Detection task. There are three main steps in our system: semantic detector training, video indexing and video search. In semantic detector training, we utilized large shared-memory machines to efficiently train more than 3000 concept detectors over 2.7 million shots. In the indexing step, our system, which extracts 47 visual and audio features, have been carefully streamlined to incur the least overhead on I/O. In the search step, GPUs and CPUs are both utilized for efficient model training and prediction. Prediction over 200,000 videos can be completed in around 200 seconds on a single workstation.


 

Shoou-I Yu (Ph.D. Student, Language Technologies Institute, Carnegie Mellon University)

Shoou-I Yu

Shoou-I Yu received the B.S. in Computer Science and Information Engineering from National Taiwan University, Taiwan in 2009. He is now a Ph.D. student in Language Technologies Institute, Carnegie Mellon University. His research interests include multimedia retrieval and computer vision.

 

THURSDAY, NOVEMBER 20 | BOOTH #1727

10:30 - 10:50

Map-D: Using GPUs for Hyper-Interactive Exploration of Big Data

 

Many vendors claim to deliver interactive querying and visualization of big data. However, the meaning of "interactive" varies widely, sometimes referring to systems with response times in the tens of seconds. Clearly the former is needed for the tool to be transparent to the analyst; i.e. enabling exploration of data without the lag that research has shown to be detrimental to the hypothesis formulation and testing cycle. Although certain tricks can be played to improve interactivity, such as indexing and pre-computation, general-purpose interactivity requires the ability to scan and process data at a sufficient rate. In this talk, I will discuss why GPUs are currently the best tools for achieving the high throughputs needed for ad-hoc exploration of big datasets. I will also explain how Map-D leverages multiple GPUs over multiple nodes, allowing systems that can process, and using the GPU graphics pipeline, visualize many terabytes of data a second.


 

Todd Mostak (CEO/Co-Founder, Map-D)

Todd Mostak

Todd was a researcher at MIT CSAIL, where he worked in the database group, before co-founding Map-D. Seeking adventure upon finishing his undergrad, Todd moved to the Middle East, spending two years in Syria and Egypt teaching English, studying Arabic and eventually working as a translator for an Egyptian newspaper. He then completed his MA in Middle East Studies at Harvard University, afterwards taking a position as a Research Fellow at Harvard's Kennedy School of Government, focusing on the analysis of Islamism using forum and social media datasets. The impetus to build Map-D came from how slow he found conventional GIS tools to spatially aggregate and analyze large Twitter datasets.

11:00 - 11:20

Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand

 

This talk will focus on the latest advances in MVAPICH2 library that simplifies the task of porting Message Passing Interface (MPI) applications to supercomputing clusters with NVIDIA GPUs and InfiniBand interconnect. MVAPICH2 has been supporting CUDA-aware MPI communication directly from GPU device memory and optimized performance on different GPU node configurations. Recent advances in MVAPICH2 include designs for GPUDirect RDMA, optimizations for short messages including GDRCOPY, loopback support for medium-sized messages, MPI-3 one-sided communication (RMA) from GPU buffers, and a comprehensive framework for efficient MPI Datatype processing using CUDA kernels. Performance and scalability advantages of using these latest features will be presented through OSU micro-benchmark suite and example applications (such as HOOMD-Blue).


 

Dhabaleswar K. (DK) Panda (Professor, The Ohio State University)

Dhabaleswar K. (DK) Panda

Dhabaleswar K. (DK) Panda is a Professor of Computer Science and Engineering at the Ohio State University. He has published over 300 papers in major journals and international conferences. The MVAPICH2(High Performance MPI over InfiniBand, iWARP and RoCE) open-source software package, developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 2,085 organizations worldwide (in 72 countries). This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 183,000 downloads of this software have taken place from the project's website alone. He is an IEEE Fellow and a member of ACM.

11:30 - 11:50

Towards Exascale Direct Numerical Simulations of Turbulent Combustion on Heterogeneous Machines

 

Exascale computing will enable combustion simulations in parameter regimes relevant to next-generation combustors burning alternative fuels. High-fidelity simulations are needed to provide the underlying science base required to develop vastly more accurate predictive combustion models used ultimately to design fuel efficient, clean burning vehicles, planes, and power plants for electricity generation. However, making the transition to exascale poses a number of algorithmic, software and technological challenges. Recent direct numerical simulations of turbulent combustion and lessons learned from a recent port of a petascale DNS combustion code, S3D, to an exascale programming model, Legion, on Titan will be discussed.


 

Jackie Chen (Distinguished Member, Technical Staff, Combustion Research Facility, Sandia National Labs)

Jackie Chen

Jacqueline H. Chen is a Distinguished Member of Technical Staff at the Combustion Research Facility at Sandia National Laboratories. She has contributed broadly to research in petascale direct numerical simulations (DNS) of turbulent combustion focusing on fundamental turbulence-chemistry interactions. These benchmark simulations provide fundamental insight into combustion processes and are used by the combustion modeling community to develop and validate turbulent combustion models for engineering CFD simulations. In collaboration with computer scientists and applied mathematicians she is the founding Director of the Center for Exascale Simulation of Combustion in Turbulence (ExaCT). She leads an interdisciplinary team to co-design DNS algorithms, domain-specific programming environments, scientific data management and in situ uncertainty quantification and analytics, and architectural simulation and modeling with combustion proxy applications. She received the DOE INCITE Award in 2005, 2007, 2008-2014 and the Asian American Engineer of the Year Award in 2009. She is a member of the DOE Advanced Scientific Computing Research Advisory Committee (ASCAC) and Subcommittees on Exascale Computing, and Big Data and Exascale. She is the editor of Flow, Turbulence and Combustion, the co-editor of the Proceedings of the Combustion Institute, volumes 29 and 30, and is a member of the Board of Directors of the Combustion Institute.

12:00 – 12:20

Impactful Science using Blue Waters' Kepler Based Computing Resources

 

Blue Waters is the most balanced and powerful sustained computing resource in the world. Its unique blend of computational, data, interconnect and memory resources is enabling unprecedented science and engineering insights in just the first year of operation. This talk will review the decisions that led to the system's architecture and demonstrate the metrics for sustained performance on both CPU and GPU resources. It will then highlight several of the current science teams using GPUs at scale and discuss efforts to help additional teams re-engineering their applications for GPU computing. Usage patterns and observation will be presented as well.


 

William Kramer (Director, Blue Waters Project and @Scale Programs, NCSA)

William Kramer

William Kramer is the director of deputy project manager for the sustained-petascale Blue Waters project at Illinois' National Center for Supercomputing Applications (NCSA) and the director of the NCSA @Scale Program Office. Formerly, he was the general manager of the Department of Energy's National Energy Research Scientific Computing Center (NERSC). At NERSC and earlier, Kramer led the acquisition, testing, and introduction of more than 20 high-performance computing and storage systems. He was instrumental in managing the paradigm shift from vector computing to massively parallel systems and was one of the primary contributors to LBNL's Science Driven Computer Architecture initiative. He was named one of HPCWire's "People to Watch" in 2005 and 2012 and chaired SC05. At NASA Ames, he put the world's first UNIX supercomputer into production.

12:30 – 12:50

The ARM Ecosystem from Sensors to Supercomputers

 

As high performance computing marches towards exascale, energy efficiency has become increasingly an important factor with target requirements of systems 50 times more power efficient than today's supercomputers. This focus on energy efficiency has brought an increased interest in ARM technology as a component of next generation HPC. ARM focuses on designing & licensing fundamental IP building blocks - which is all about integration. Integration fosters an ecosystem of standard pieces, effectively acting as COTS-on-Silicon (my invented term). COTS-on-Silicon encourages multi-suppliers to give the ecosystem the ability to deliver, through collaboration, continuous innovative solutions for Big Data & HPC type problems. This talk will give an overview of the breadth of ARM technology and discuss how our partners such as Nvidia tightly couple our designs with their own products to provide compelling solutions. It will also cover some of ARM's ongoing research and development activities within the high performance computing market segment.


 

Eric Van Hansbergen (Principal Design Engineer - High Performance Computing - ARM Research, ARM)

Eric Van Hensbergen

Eric Van Hensbergen has been a member of ARM for over 2 years, Prior to ARM he was a research staff member at IBM Research for 12 years and a Member of the Technical Staff at Bell-Labs for 7 years working in the areas of operating systems, dense servers and HPC. His current research focuses on exploring energy-efficient approaches to HPC through balance-driven co-design.

13:00 – 13:20

Extending the Reach of Parallel Computing with CUDA

 

CUDA, NVIDIA's parallel computing platform and programming model, is extending its reach. CUDA support for GPU computing is expanding to support systems based on x86, ARM64, and POWER CPUs, providing a choice of high-performance computing platforms. Programmers can program GPUs natively in the most popular programming languages: C, C++, Fortran, Python, and Java. New CUDA software features like Unified Memory, drop-in libraries, and powerful developer tools make high-performance GPU computing with CUDA easier than ever before. And NVIDIA's future GPU architectures and NVLink interconnect will provide unprecedented efficiency for heterogeneous computing. In this talk you'll learn about the latest developments in the NVIDIA CUDA computing platform and ecosystem, and get insight into the philosophy driving the development of CUDA.


 

Mark Harris (CTO GPU Computing, NVIDIA)

Mark Harris

Mark Harris is Chief Technologist for GPU Computing at NVIDIA, where he works as a developer advocate and helps drive NVIDIA's GPU computing software strategy. His research interests include parallel computing, general-purpose computation on GPUs, physically based simulation, and real-time rendering. Mark founded www.GPGPU.org while he was earning his PhD in computer science from the University of North Carolina at Chapel Hill. Mark brews his own beer and cures his own bacon in Brisbane, Australia, where he lives with his wife and daughter.

13:30 - 13:50

OpenACC 2.0 and Beyond

 

Come learn about how the PGI 2015 compilers implement version 2.0 of OpenACC API, and how OpenACC is evolving to help program manage complex dynamic data structures. Listen to how OpenACC will handle the next generation of high performance computer systems, in particular with the upcoming novel memory architectures.


 

Michael Wolfe (Compiler Engineer, PGI)

Michael Wolfe

Michael Wolfe has been a compiler engineer at The Portland Group since joining in 1996, where his responsibilities and interests have included deep compiler analysis and optimizations ranging from improving power consumption for embedded microcores to improving the efficiency of Fortran on parallel clusters. He was an associate professor at the Oregon Graduate Institute from 1988 until 1996, and was a cofounder and lead compiler engineer at Kuck and Associates, Inc., prior to that. He earned a PhD in Computer Science from the University of Illinois, and has published one textbook, "High Performance Compilers for Parallel Computing", a monograph, "Optimizing Supercompilers for Supercomputers", and many technical papers.

 
 

BOOTH INFORMATION

FOLLOW US

 
CONTACT US