Relive 400+ Hours of Innovative Thinking

Select from hundreds of sessions.
All Presentations

Stay Connected

Speakers, Sessions and
Discounts. Get important
updates about GTC 2012.
 
 

GTC On-Demand

 
GTC On-Demand Featured Talks

GPU computing is a transformational force in high performance computing and is enabling developers, engineers, programmers and researchers across a myriad of industry verticals, as well as academia, to accelerate research and mission critical applications. Review the featured sessions from SC11 below and when you’re ready delve head-long into the many other keynotes, technical sessions, presentations, research posters, webinars and tutorials we make available to you at any time on GTC On-Demand.

Select a Topic
or
 
Refine by:
Event or Conference
Year
Format
 
FEATURED TALKS: GPU TECHNOLOGY CONFERENCE ASIA 2011
Astronomy & Astrophysics
Presentation
Media
Rainer Spurzem
- Chinese Academy of Sciences, National Astronomical Observatories
New powerful supercomputers have been built using graphical processing units (GPU) for general purpose computing. China has obtained top ranks in the list of the fastest supercomputers in the world with such systems. The research of Chinese Academy ...Read More
New powerful supercomputers have been built using graphical processing units (GPU) for general purpose computing. China has obtained top ranks in the list of the fastest supercomputers in the world with such systems. The research of Chinese Academy of Sciences and National Astronomical Observatory in Beijing with such GPU clusters will be reviewed, present and future applications in computer simulation and data processing discussed. We present particle- and mesh-based algorithms for astrophysics using hundreds to thousands of GPUs for one single application run in a parallel message passing environment, some with detailed timing models. Future perspectives for GPU and FPGA accelerated computing will be discussed and international collaboration in the ICCS (International Center for Computational Science). GPU and other 'green' supercomputing hardware is a stepping stone on the path to reach Exascale supercomputing. An application to astrophysical Computer Simulations of Dense Star Clusters in Galactic Nuclei with Supermassive Black Holes is presented. We use large high-accuracy direct N-body simulations with Hermite scheme and block-time steps, parallelised across a large number of nodes on the large scale and across many GPU thread processors on each node on the small scale. We reach a sustained performance of more than 350 Tflop/s for a science run on 1600 Fermi C2050 GPUs; a performance model is presented and studies for the largest GPU clusters in China with up to Petaflop/s performance and 7000 Fermi GPU cards. Our simulation proceeds to the complete relativistic merger of the black holes, including Post-Newtonian corrections to gravitational forces and the relevance of the results for the cosmological background of gravitational radiation is briefly touched. We discuss the relevance of this for pulsar timing bands and for frequency bands of new space based gravitational wave missions in China and Europe.  Back
Keywords:
Astronomy & Astrophysics, GTC Asia 2011 - ID 2082
Streaming:
Download:
Computer-aided Engineering
Presentation
Media
Yang Weng
- SAIC Motor
In this session, the speaker will share some experiences about using GPU solver acceleration for vehicle structure. Those simulations are submitted by using ABAQUS\STANDARD. GPU+CPU can save design time for some cases; especially for some optimizati ...Read More
In this session, the speaker will share some experiences about using GPU solver acceleration for vehicle structure. Those simulations are submitted by using ABAQUS\STANDARD. GPU+CPU can save design time for some cases; especially for some optimization tasks.   Back
Keywords:
Computer-aided Engineering, GTC Asia 2011 - ID 2105
Climate & Weather Modeling
Presentation
Media
Takayuki Aoki
- Tokyo Institute of Technology
Numerical weather prediction is one of the major applications in high performance computing and demands fast and high-precision simulation over fine-grained grids. In order to drastically shorten the runtime of a weather prediction code, we have rew ...Read More
Numerical weather prediction is one of the major applications in high performance computing and demands fast and high-precision simulation over fine-grained grids. In order to drastically shorten the runtime of a weather prediction code, we have rewritten its huge entire code for GPU computing from scratch in CUDA. The code ASUCA is a high resolution meso-scale atmosphere model that is being developed by the Japan Meteorological Agency (JMA) for the purpose of the next-generation weather forecasting service. A benchmark on the 3996 GPUs on TSUBAME 2.0 achieves extremely high performance of 145 Tflops in single precision for 14368 × 14284 × 48 mesh. With the initial data and the boundary condition currently used in the JMA weather forecast, we have carried out the run with 500m horizontal mesh 4792 × 4696 × 48, covering whole Japan area with 437 GPUs.  Back
Keywords:
Climate & Weather Modeling, GTC Asia 2011 - ID 2120
 
Subodh Kumar
- IIT Delhi, New Delhi, India
We present our recently undergoing work to design and develop a GPU based unified modeling system for seamless weather and climate predictions of Monsoons. The system design is capable of handling different time and spatial scales of atmospheric phe ...Read More
We present our recently undergoing work to design and develop a GPU based unified modeling system for seamless weather and climate predictions of Monsoons. The system design is capable of handling different time and spatial scales of atmospheric phenomena that are crucial for accurate forecasting of weather and regional climates, and of monsoons in particular. Our focus is on high-resolution model utilizing accurate approximations on the icosahedral-hexagonal grid. We also develop parameterizations of fine and multi-scale moist convective processes, cloud microphysics and precipitation, radiative transfer, hydrology and land surface processes, atmospheric and oceanic turbulence. Starting with the core of LMDZ model, we are developing from scratch a parallel version appropriate for efficient computation on GPUs and CPUs. Another goal of our system design is to rid the programmer with low level programming details using a programming model that automatically distributes computation among all available CPUs and GPUs appropriately. We are developing a programming API to unify parallel code development on CPUs and GPUs.  Back
Keywords:
Climate & Weather Modeling, GTC Asia 2011 - ID 2121
 
Thomas Schulthess
- Swiss National Supercomputing Center, ETH Zurich
Numerical weather prediction is among the oldest fields of computational science, and existed before the advent of electronic computing. Thanks to the performance of modern computers, the fidelity of weather simulations has reached a point where the ...Read More
Numerical weather prediction is among the oldest fields of computational science, and existed before the advent of electronic computing. Thanks to the performance of modern computers, the fidelity of weather simulations has reached a point where they are indispensible in weather forecasting, and thus have become one of the economically most impactful domains of computational science. Typically, the dynamical cores of models of weather simulations are grid based and memory bandwidth bound, thus performing poorly on modern X86 type processors. In this presentation, we will discuss a refactoring project of the COSMO code that implements a regional climate model used by several weather services and academic institutions worldwide. The dynamical core has been rewritten and is easily portable to multiple architectures, including GPU. The physics part of the code is being ported to GPU with OpenACC directives. Preliminary performance results for production scale problems will be presented. Other contributors to this research include Oliver Fuhrer, Swiss Federal Office of Meteorology and Climatology MeteoSwiss, Tobias Gysi and David Müller, Supercomputing Systems AG, Xavier Lapillonne, Center for Climate Systems Modeling, ETH Zurich, William Sawyer, Ugo Varetto, and Mauro Bianco, Swiss National Supercomputing Center.  Back
Keywords:
Climate & Weather Modeling, GTC Asia 2011 - ID 2123
 
Bin Zhou
- NVIDIA
In this session, we will discuss the GRAPES weather model and the basic porting techniques to GPU platform. It will cover how to start from the very beginning to the low level optimization. The MPI+CUDA pattern will be discussed. Four different mod ...Read More
In this session, we will discuss the GRAPES weather model and the basic porting techniques to GPU platform. It will cover how to start from the very beginning to the low level optimization. The MPI+CUDA pattern will be discussed. Four different modules, including GCR, Radiation, WSM6 and PBL, will be demonstrated. The performance consideration will be discussed and results showed. It will be a good example for a real-life scientific application porting procedure.  Back
Keywords:
Climate & Weather Modeling, GTC Asia 2011 - ID 2124
 
Xueshang Feng
- State Key Lab of Space Weather, Chinese Academy of Sciences
Space weather refers to conditions on the sun and in the solar wind, magnetosphere, ionosphere, and thermosphere that can influence the performance and reliability of space-borne and ground-based technological systems and that affect human life or h ...Read More
Space weather refers to conditions on the sun and in the solar wind, magnetosphere, ionosphere, and thermosphere that can influence the performance and reliability of space-borne and ground-based technological systems and that affect human life or health. Space weather has two focal points: scientific research and applications. In order to make the real- or faster than real-time numerical prediction of adverse space weather events and their influence on the geospace environment, high performance computational models are required. The main objective in this talk is how programmable GPUs can be used in the numerical space weather modeling and its visualization. As an example study, GPU programming is realized for our Solar-Interplanetary-CESE MHD model (SIP-CESE MHD model) and the visualization of its numerical results by numerically studying the solar corona. Our initial tests with available hardware show speedups of roughly 10x compared to traditional software implementation. This work presents a novel application of GPU to the space weather study.  Back
Keywords:
Climate & Weather Modeling, GTC Asia 2011 - ID 2125
Computational Fluid Dynamics
Presentation
Media
Nikolai Sakharnykh
- NVIDIA
Learn about multiple GPU implementation of the Alternating Direction Implicit method for large 3D domains. A case study of ADI method application for direct numerical fluid simulation will be explored. To simulate complex flows direct methods requir ...Read More
Learn about multiple GPU implementation of the Alternating Direction Implicit method for large 3D domains. A case study of ADI method application for direct numerical fluid simulation will be explored. To simulate complex flows direct methods require extremely large grids that simply can’t fit into one device memory. Therefore efficient load balancing between multiple GPUs and multiple nodes is essential for direct fluid simulation codes. Complex boundaries of input geometry introduce additional challenges for distributed memory systems. In this session a novel distributed tridiagonal solver for systems with variable sizes will be covered in detail. Finally, a comprehensive performance analysis for different input geometry and possible future improvements will be discussed.  Back
Keywords:
Computational Fluid Dynamics, GTC Asia 2011 - ID 1087
 
Patrice Castonguay
- Stanford University
This work will present the development of a scalable and efficient high-order unstructured compressible fluid flow solver for GPUs. The solver utilizes energy stable flux reconstruction schemes in both tensor-product and simplex elements, allowing t ...Read More
This work will present the development of a scalable and efficient high-order unstructured compressible fluid flow solver for GPUs. The solver utilizes energy stable flux reconstruction schemes in both tensor-product and simplex elements, allowing the achievement of arbitrary order of accuracy for flows over complex geometries. Because of the high arithmetic intensity associated with energy stable flux reconstruction schemes and their element-local nature, they are well suited for GPUs. The single-GPU solver developed in this work achieves speed-ups of up to 45x relative to a serial computation on a current generation CPU. Additionally, the multi-GPU solver scales well, and when running on 32 GPUs achieves a sustained performance of 2.8 Teraflops (double precision) for 6th-order accurate simulations with tetrahedral elements. In this talk, the techniques used to achieve this level of performance are discussed and a performance analysis is presented. To the authors' knowledge, the aforementioned flow solver is the first high-order, three-dimensional, compressible Navier-Stokes solver for mixed unstructured grids that has been implemented for a multi-GPU cluster.  Back
Keywords:
Computational Fluid Dynamics, GTC Asia 2011 - ID 2100
 
Jack Huang
- Dassault Systèmes Simulia Corp
This talk is a general introduction of nonlinear finite element analysis software Abaqus analysis using the GPU to accelerate the speed of the function, the function of the finite element model with large-scale analysis has excellent acceleration ef ...Read More
This talk is a general introduction of nonlinear finite element analysis software Abaqus analysis using the GPU to accelerate the speed of the function, the function of the finite element model with large-scale analysis has excellent acceleration effect. The talk will also take a look at GPU acceleration in Abaqus as it pertains to development.  Back
Keywords:
Computational Fluid Dynamics, GTC Asia 2011 - ID 2101
 
Takayuki Aoki
- Global Scientific Information and Computing Center, Tokyo Institute of Technology
Several large-scale stencil applications have been successfully developed on GPU-rich supercomputer TSUBAME 2.0, which is equipped with 4224 NVIDIA Tesla M2050 GPUs, and has started the operation since November 2010 at Tokyo Tech. Stencil computing ...Read More
Several large-scale stencil applications have been successfully developed on GPU-rich supercomputer TSUBAME 2.0, which is equipped with 4224 NVIDIA Tesla M2050 GPUs, and has started the operation since November 2010 at Tokyo Tech. Stencil computing on a regular structured grid is suitable for GPU computing since high performance of the memory access can be achieved with the on-board memory. In an explicit time integration, we introduce a technique overlapping the GPU-to-GPU communication with the computation into large-scale applications for the purpose to hide the communication overhead. A phase-field simulation runs for a dendritic solidification of the Al-Si binary alloy. In our largest configuration of 4096 × 6500 × 10400, TSUBAME 2.0 achieved 2.0 Petaflops in single precision, using 4,000 GPUs along with 16,000 CPU cores. The sustained performance has reached 45 % of the peak performance. We also demonstrate gas-liquid two-phase flows and results of Lattice Boltzmann Method.  Back
Keywords:
Computational Fluid Dynamics, GTC Asia 2011 - ID 2102
Streaming:
Download:
 
Dan Negrut
- Simulation-Based Engineering Lab Wisconsin Applied Computing Center University of Wisconsin – Madison
This talk will explore the use of heterogeneous CPU/GPU computing, as enabled by an in-house developed Heterogeneous Computing Template (HCT), for physics-based simulations of mechanical systems. HCT draws on five components: advanced physics-based ...Read More
This talk will explore the use of heterogeneous CPU/GPU computing, as enabled by an in-house developed Heterogeneous Computing Template (HCT), for physics-based simulations of mechanical systems. HCT draws on five components: advanced physics-based modeling techniques (formulating the relevant physics equations); algorithmic support (solving these equations); proximity computation (mostly collision detection); domain decomposition/data exchange (for multi-node distributed CPU/GPU computing); and post-processing/visualization. These five components make up a computational framework capable of analyzing many different types of mechanical systems with millions of interacting elements. Example applications will include granular terrain simulation, tracked and wheeled vehicle mobility studies (tanks, Mars Rover, etc.), fluid-solid interaction analysis, and nonlinear finite element analysis.  Back
Keywords:
Computational Fluid Dynamics, GTC Asia 2011 - ID 2103
 
Wei Ge
- Institute of Process Engineering, Chinese Academy of Sciences
Currently, the mainstream simulation method for gas-solid flow treats both the gas and solid phases as continuum, which is computationally economic. However, due to the intrinsic discrete nature of the solid phase, its constitutive laws as a continu ...Read More
Currently, the mainstream simulation method for gas-solid flow treats both the gas and solid phases as continuum, which is computationally economic. However, due to the intrinsic discrete nature of the solid phase, its constitutive laws as a continuum are not easily obtained. On the other hand, direct discrete presentation of the solid phase, though more reasonable and simple, is far beyond the capability of current computing technology. In recent years, however, coarse-grained (CG) discrete modeling begins to show the feasibility of industrial scale discrete solid phase simulation. Evolution of the computational particles features additive and localized operations which are best carried out by many-core processors, such as GPUs, in the highly parallel mode of single-instruction multi-data (SIMD). The gas flow can be solved either by traditional finite difference (FD) or finite volume (FV) methods, or by LBM methods, at scales either above or below the particle scale, which are suitable for CPUs or GPUs, respectively. We will present in the session quasi-real-time simulation of experimental gas-solid systems using the Mole-8.5 system at CAS-IPE and prospect on the possibility of real-time simulation in near future.  Back
Keywords:
Computational Fluid Dynamics, GTC Asia 2011 - ID 2104
 
James Lin
- Shanghai Jiaotong University
Sheep-NS3D is an in-house CFD code developed in SJTU, solving 3D Reynolds Average Navier-Stokes (RANS) equations on structured grids by finite volume method, which could be used in designing wing model. In this talk, we will present the design and f ...Read More
Sheep-NS3D is an in-house CFD code developed in SJTU, solving 3D Reynolds Average Navier-Stokes (RANS) equations on structured grids by finite volume method, which could be used in designing wing model. In this talk, we will present the design and further optimization of CUDA version of Sheep-NS3D, and it achieves 20-fold speedup for standard M6 wing model and 37-fold speedup for wing model candidate from COMAC on single Fermi C2050.  Back
Keywords:
Computational Fluid Dynamics, GTC Asia 2011 - ID 2106
 
Shinya Kitaoka
- Prometech Software, Inc.
In this session, we introduce a full GPU implementation techniques inof our the application software, Particleworks, which is a particle-based fluid simulation tool for CAE., and show itsP performance results will be provided and will compared with ...Read More
In this session, we introduce a full GPU implementation techniques inof our the application software, Particleworks, which is a particle-based fluid simulation tool for CAE., and show itsP performance results will be provided and will compared with a a naïve basic CPUs implementation and described by several industry-scale use cases by our client’sexamples from Particleworks users users. Our company, Prometech Software , Inc., has been working in development of in the field of CAE with the Particleworks for the CAE industry , through collaborations with major automotive companies and basic material companies in Japan. The Particleworks can usehas a full range of capabilities including solution to Newtonian and non-Newtonian fluids, and solves for viscosity, turbulence, surface tension, and heat transfer problems on GPUsand several other fluid flow quantities. We This presentation will explain fromdescribe a fundamental theory of particle-based simulations to buildand the development of the Particleworks on GPUs along with resolution of potential performance . Furthermore we show cause of its performance bottle necks, and give methods for avoiding those bottle necks, and provide industry examples of of Particleworks applied in CAE practice.  Back
Keywords:
Computational Fluid Dynamics, GTC Asia 2011 - ID 2107
 
David Street
- ANSYS Incorporated
Engineers are designing ever more complicated products with innovative new features in order to stay ahead of the competition. The products often need to be brought to market faster, with higher quality and with tighter tolerances. Engineers around ...Read More
Engineers are designing ever more complicated products with innovative new features in order to stay ahead of the competition. The products often need to be brought to market faster, with higher quality and with tighter tolerances. Engineers around the world routinely use computer-based engineering simulation for product design, but best in class companies have developed methodologies and best practices to perform engineering simulations much early in the design cycle, often at the conceptual stage, when it easier to make changes. However, this requires that engineers are able to simulate with confidence the many different physical phenomena that interact to influence product behavior. To this it is necessary to perform multi-physics and system level calculations. It is also important that real world simulations can be calculated quickly so that numerous design variations can be invested to arrive at optimal design. It is this need for speed that is driving the increasing demand for high performance calculations utilizing technologies from leading hardware developers such as NVIDIA. In this presentation, we overview simulation-driven product development and show how technologies from ANSYS and NVIDIA are allowing engineers to stay ahead of the pack.  Back
Keywords:
Computational Fluid Dynamics, GTC Asia 2011 - ID 2108
Computer Graphics
Presentation
Media
Wisdom Wei
- Zanqi
See the hottest new technologies from startups that are transforming computing. The Emerging Companies Summit “CEO on Stage” is a lively and fast-paced program that provides invited CEOs an opportunity to present their companies, products and strate ...Read More
See the hottest new technologies from startups that are transforming computing. The Emerging Companies Summit “CEO on Stage” is a lively and fast-paced program that provides invited CEOs an opportunity to present their companies, products and strategies to a panel of investors, analysts and technology leaders, who in turn will provide insightful feedback.  Back
Keywords:
Computer Graphics, GTC Asia 2011 - ID 1053
Developer Talk
Presentation
Media
Ian Buck
- NVIDIA
Learn how the GPU evolved from its humble beginning as a “VGA Accelerator” to become a massively parallel general purpose accelerator for heterogeneous computing systems. This talk will focus on significant milestones in GPU hardware architecture an ...Read More
Learn how the GPU evolved from its humble beginning as a “VGA Accelerator” to become a massively parallel general purpose accelerator for heterogeneous computing systems. This talk will focus on significant milestones in GPU hardware architecture and software programming models, covering several key concepts that demonstrate why advances in GPU parallel processing performance and power efficiency will continue to outpace CPUs.  Back
Keywords:
Developer Talk, GTC Asia 2011 - ID 2060
 
Yunquan Zhang
- Institute of Software, Chinese Academy of Science
In this talk, we first introduce the background of SAMSS China HPC TOP100 rank list. Then we give the total performance trend of China HPC TOP100 and TOP 10 of 2011. Following this, the performance, manufacturer, and application area of the 2011 Chi ...Read More
In this talk, we first introduce the background of SAMSS China HPC TOP100 rank list. Then we give the total performance trend of China HPC TOP100 and TOP 10 of 2011. Following this, the performance, manufacturer, and application area of the 2011 China HPC TOP100 are analyzed in detail. Based on public available historical data and TOP100 supercomputers peak performance data from 1993 to 2011 in mainland China, we predict the future performance trend of China HPC TOP100.  Back
Keywords:
Developer Talk, GTC Asia 2011 - ID 2064
Download:
 
Joy Lee
- NVIDIA
Learn about a new algorithm to efficiently implement the ADI (Alternating Direction Implicit) method on multi-GPU systems. ADI is a finite difference method commonly used in computational fluid dynamics for solving multi-dimensional parabolic and el ...Read More
Learn about a new algorithm to efficiently implement the ADI (Alternating Direction Implicit) method on multi-GPU systems. ADI is a finite difference method commonly used in computational fluid dynamics for solving multi-dimensional parabolic and elliptic partial differential equations with very high stability. The presentation will first review the single-GPU version of the parallel cyclic reduction algorithm used to solve the tridiagonal system at the core of the ADI method. It will then introduce an extension of this algorithm that works across multiple GPUs with minimum inter-GPU communication.  Back
Keywords:
Developer Talk, GTC Asia 2011 - ID 2126
Development Tools and Libraries
Presentation
Media
Cliff Woolley
- NVIDIA
This session will give a high-level overview of the rapidly evolving set of programming languages and libraries available to GPGPU developers for compute applications. Commercial tools for compilation, debugging, profiling, will be described, as wel ...Read More
This session will give a high-level overview of the rapidly evolving set of programming languages and libraries available to GPGPU developers for compute applications. Commercial tools for compilation, debugging, profiling, will be described, as well as how they leverage underlying NVIDIA technology. The talk will also highlight the GPU support that several vendors have added to popular tools for cluster management and monitoring of hybrid nodes.  Back
Keywords:
Development Tools and Libraries, GTC Asia 2011 - ID 1061
Energy Exploration
Presentation
Media
Geoff Clark
- Acceleware
See the hottest new technologies from startups that are transforming computing. The Emerging Companies Summit “CEO on Stage” is a lively and fast-paced program that provides invited CEOs an opportunity to present their companies, products and strate ...Read More
See the hottest new technologies from startups that are transforming computing. The Emerging Companies Summit “CEO on Stage” is a lively and fast-paced program that provides invited CEOs an opportunity to present their companies, products and strategies to a panel of investors, analysts and technology leaders, who in turn will provide insightful feedback.  Back
Keywords:
Energy Exploration, GTC Asia 2011 - ID 1051
 
Anthony Lichnewsky
- Schlumberger
Learn how the Oil and Gas industry is embracing GPUs in order to tackle new and complex geological settings around the world. The first part of this talk will give an overview of the business and geopolitical drivers of the industry. Then I will exp ...Read More
Learn how the Oil and Gas industry is embracing GPUs in order to tackle new and complex geological settings around the world. The first part of this talk will give an overview of the business and geopolitical drivers of the industry. Then I will expand on the computational challenges of seismic modeling, imaging and inversion. Finally I will show how the current production applications take advantage of GPU technology.  Back
Keywords:
Energy Exploration, GTC Asia 2011 - ID 2090
 
Ty McKercher
- NVIDIA
Learn how 3D elastic seismic applications can benefit from a directive-based approach for acceleration, and how these applications will scale using clustered systems that support multiple GPUs. New seismic acquisition techniques collect shear and co ...Read More
Learn how 3D elastic seismic applications can benefit from a directive-based approach for acceleration, and how these applications will scale using clustered systems that support multiple GPUs. New seismic acquisition techniques collect shear and compressional wave data that can be used in 3D elastic equations to more accurately simulate wave propagation. This session shares results from performance and power efficiency experiments, and compare different system configurations.   Back
Keywords:
Energy Exploration, GTC Asia 2011 - ID 2091
 
Shuhe Zheng
- Paradigm
Keywords:
Energy Exploration, GTC Asia 2011 - ID 2092
 
Dick Bland
- Hewlett Packard Company
We will discuss the benefits of NVIDIA and HP technologies across the oil/gas workflow, with information about solutions that deliver more throughput, sharper subsurface images, reduced cycle times, and improved total cost of ownership. We will dis ...Read More
We will discuss the benefits of NVIDIA and HP technologies across the oil/gas workflow, with information about solutions that deliver more throughput, sharper subsurface images, reduced cycle times, and improved total cost of ownership. We will discuss customer successes with GPU computing, and how HP/NVIDIA can benefit your company. An HP/NVIDIA design centered on oil/gas applications, can transform the way solutions are delivered to global asset teams collaborating with industry standard infrastructure and data security requirements. Reference architectures benchmarked from large-scale GPU-based production environments will be highlighted, as well as the value of the HP/NVIDIA solution.  Back
Keywords:
Energy Exploration, GTC Asia 2011 - ID 2093
 
Xiaowei Wang
- Institute of Process Engineering, Chinese Academy of Sciences
This report presents two parts of research related with oil recovery: multi-scale simulation of fluid flow in the fracture-cave type reservoir and direct simulation of the porous-media flow at pore-scale. The complex flow in fracture-cave reservoir ...Read More
This report presents two parts of research related with oil recovery: multi-scale simulation of fluid flow in the fracture-cave type reservoir and direct simulation of the porous-media flow at pore-scale. The complex flow in fracture-cave reservoir has a multi-scale characteristic as there is big size gap between the fractures and caves. Through micro-scale simulation of two-phase flow in different kinds of combination of the single fracture and cavity, we can explore the mechanism of immiscible displacement of oil by water. With the multi-scale coupling method we can simulate the water flooding at scales of engineering interest. We also performed the direct simulations of the pore-scale flows and calculated the permeability and relative permeability for a series of rock samples, which are in agreement with the experimental results and is useful for the reservoir development. All the simulations above are carried out using GPUs with a good performance.   Back
Keywords:
Energy Exploration, GTC Asia 2011 - ID 2094
 
Hongwei Liu
- Beijing Geostar Science and Technology Co. Ltd / Institute of Geology and Geophysics, Chinese Academy of Sciences
Data pre-stack time migration and depth migration are the most time consuming parts of seismic processing. With the continuous deepening of exploration and production, existing computer resources have been unable to meet the needs. Since 2008, Geost ...Read More
Data pre-stack time migration and depth migration are the most time consuming parts of seismic processing. With the continuous deepening of exploration and production, existing computer resources have been unable to meet the needs. Since 2008, Geostar has been developing GPU/CPU collaborative computing pre-stack seismic data migration technologies, including asymmetric travel time pre-stack time migration, one-way wave pre-stack depth migration, and reverse-time migration. These technologies have been widely used in China in various oil fields and have achieved satisfactory results; this presentation will mainly explain the application of these technologies.  Back
Keywords:
Energy Exploration, GTC Asia 2011 - ID 2095
 
Haiquan Wang
- LandOcean Eneregy Services Co., Ltd
LandOcean Energy Services is one of the largest implementers of GPU-enabled seismic data processing. This presentation will focus on the successful deployment of CUDA programming, algorithm research, and GPU hardware implementation in Prestack Time ...Read More
LandOcean Energy Services is one of the largest implementers of GPU-enabled seismic data processing. This presentation will focus on the successful deployment of CUDA programming, algorithm research, and GPU hardware implementation in Prestack Time Migration (PSTM) and Reverse-time Migration (RTM).   Back
Keywords:
Energy Exploration, GTC Asia 2011 - ID 2096
Exascale
Presentation
Media
Satoshi Matsuoka
- Tokyo Institute of Technology, Global Scientific Information and Computing Center
Tsubame2.0 came into being in Nov. 1, 2011, and has been running in full production since then with very little interruption. Among the challenges had been attaining stability in the machine, extracting maximum performance out of thousands of GPUs, ...Read More
Tsubame2.0 came into being in Nov. 1, 2011, and has been running in full production since then with very little interruption. Among the challenges had been attaining stability in the machine, extracting maximum performance out of thousands of GPUs, and devising a scheduling model for 2,000 users of mixed variety. One of the biggest challenges had been to deal with substantial electricity shortage after the Fukushima disaster, resulting in a national mandate for peak power conservation which Tsubame2 met graciously without substantially sacrificing user experience with the machine. The accolade with TSUBAME2.0 has been the bulk of application and system research results that have been achieved, including the two Gordon Bell prize finalist for SC11. Such experiences are valuable stepping stones as we strive to achieve exascale in the coming years.  Back
Keywords:
Exascale, GTC Asia 2011 - ID 1040
 
Wei Ge
- Institute of Process Engineering, Chinese Academy of Sciences
One of the fundamental challenges to chemical engineering is the vast scale difference between molecular structures that define the properties or functions of the chemical products and the reactors or equipments that produce these materials, it typi ...Read More
One of the fundamental challenges to chemical engineering is the vast scale difference between molecular structures that define the properties or functions of the chemical products and the reactors or equipments that produce these materials, it typically ranges from 10-10m and 10-15s to 101m and 103s, and can be even wider. The supercomputing capabilities made available by exascale systems will provide a unique opportunity to link all these scales. As an indication, consistent designing of the physical and mathematical models and the computer software and hardware has led to rigorous molecular dynamics (MD) simulations at truly petaflops sustainable performance and micron scales in three dimensions. For example, using 1728 GPUs of the Mole-8.5 system, a complete influenza virion, H1N1, constructed by 300 million atoms or radicals including the aqueous solution, is simulated at a speed of 0.77 ns per day. And using all CPUs and GPUs of the Tianhe-1A system, crystalline silicon were simulated with more than 100 billion atoms at petaflops speed. Based on the strategy demonstrated in these examples, general purpose software and hardware platforms for discrete simulation can be established, giving a powerful tool for trans-scale simulation in chemical engineering, especially the fascinating nano- and micro-scale structures.  Back
Keywords:
Exascale, GTC Asia 2011 - ID 1041
 
Jeff Vetter
- Oak Ridge National Laboratory & Georgia Institute of Technology
Recent reports have identified multiple challenges on the road to Exascale computing systems. Although these challenges include the unrelenting issues of performance, scalability, and productivity in the face of ever-increasing complexity of archite ...Read More
Recent reports have identified multiple challenges on the road to Exascale computing systems. Although these challenges include the unrelenting issues of performance, scalability, and productivity in the face of ever-increasing complexity of architectures and applications, they also include the relatively new priorities of energy-efficiency and resiliency. Not coincidentally, recently announced HPC architectures, such as Tianhe, Tsubame, Titan, Nebulae, Dash, and Keeneland, illustrate that emerging technologies, such as graphics processors and non-volatile memory, can provide innovative solutions to address these challenges. Early experiences on these systems have demonstrated the requisite performance and power benefits. Likewise, these experiences have also illustrated that software will play an increasingly critical role in future systems in order to address challenges that include sensitive orchestration of data movement across the global memory hierarchy, required use of multiple programming models, and poor portability across diverse architectures. Our NSF Keeneland project has deployed a GPU-based system for the NSF user community, and we find that these issues are impeding the adoption of these innovative architectures by the broader scientific community. In this talk, I will discuss recent research advances aimed at lowering these productivity barriers for both existing systems and future Exascale architectures.  Back
Keywords:
Exascale, GTC Asia 2011 - ID 1042
 
Thomas Schulthess
- Swiss National Supercomputing Center, ETH Zurich
Five years ago, the use of GPU in simulation-based science was mainly experimental and limited to problems that could tolerate low precision and faults. Since the introduction of ECC memory and better double precision floating point performance two ...Read More
Five years ago, the use of GPU in simulation-based science was mainly experimental and limited to problems that could tolerate low precision and faults. Since the introduction of ECC memory and better double precision floating point performance two years ago, the use of GPU has exploded. We now see application for production simulations in such diverse areas as life sciences, materials science and chemistry, astrophysics, biomedical engineering, as well seismic imaging and climate/weather simulations. This explosion happened despite the need for significant investments into software refactoring to run codes on hybrid systems with GPU – why? I will discuss several applications developed at Oak Ridge National Laboratory (ORNL) and within the Swiss platform for High-Productivity and High-Performance Computing (HP2C, see www.hp2c.ch) in the fields of materials science, meteorology, geophysics, and astrophysics, pointing out algorithmic reasons for the successful use of hybrid nodes with GPU. Besides architectural reasons, the need to change the programming model motivates algorithmic redesign and refactoring of application codes, which improves efficiency even on traditional multi-core processors. Refactoring of algorithms and codes is one of the main thrusts of the HP2C platform.  Back
Keywords:
Exascale, GTC Asia 2011 - ID 1043
 
Wen-mei Hwu
- University of Illinois at Urbana-Champaign
The designs of all top supercomputers in the world have become constrained by power consumption. It has been well documented that clusters using GPUs achieve much higher performance per Watt than those with traditional CPUs alone. As a result, an in ...Read More
The designs of all top supercomputers in the world have become constrained by power consumption. It has been well documented that clusters using GPUs achieve much higher performance per Watt than those with traditional CPUs alone. As a result, an increasing number of the world’s top supercomputers are now using GPUs. This change requires a major shift of supercomputer application development. In the past, supercomputer application developers primarily focused on partitioning work to an increasing number of nodes, keeping the execution mostly sequential within each node. With the shift to GPUs, the applications must now support massively fine-grained parallel execution within each node. This requires new physics models, numeric algorithms, basic libraries, programming environments, and programming techniques. A major challenge is to achieve future scalability: these applications must be able to scale effectively in terms of hardware parallelism and data size in the future. Otherwise, the investment will be lost in a few years. In this talk, Prof. Hwu will discuss recent advances, their implications on science and engineering research, and future research opportunities and use case studies based on real applications to illustrate the work involved and the magnitude of its impact.  Back
Keywords:
Exascale, GTC Asia 2011 - ID 1044
 
Jeff Nichols
- Oak Ridge National Laboratory
Partnerships are essential to sustaining leadership in computing, particularly in the quest to overcome the many technological barriers to achieving exascale capabilities before the end of this decade. Co-design, uncertainty quantification, and clo ...Read More
Partnerships are essential to sustaining leadership in computing, particularly in the quest to overcome the many technological barriers to achieving exascale capabilities before the end of this decade. Co-design, uncertainty quantification, and close “collaborative” partnerships with industry are essential to delivering systems that will enable building predictive modeling and simulation capabilities that can be applied to complex systems of systems. At Oak Ridge National Laboratory (ORNL), providing the world’s most powerful open resources for scalable computing and simulation, data, and infrastructure for science is a critical mission. Incredible expertise in scalable applications, algorithms and analytics, tools and middleware, hardware systems and associated software, and computing infrastructure are brought together to deliver leading-edge science relevant to missions of the Department of Energy and other federal and state agencies. In fact, petascale science is being delivered today with five scientific applications running at more than 1 petaflops sustained performance. This talk will highlight many of the opportunities and challenges posed by exascale computing as well as present ORNL's plans and perspectives to achieving the exascale by the end of the decade.  Back
Keywords:
Exascale, GTC Asia 2011 - ID 1045
 
Simone Melchionna
- National Research Council Italy
We present a computational framework for multi-scale simulations of real-life biofluidic problems and applied to the simulation of blood flow through the human coronary arteries with a spatial resolution comparable with the size of red blood cells, ...Read More
We present a computational framework for multi-scale simulations of real-life biofluidic problems and applied to the simulation of blood flow through the human coronary arteries with a spatial resolution comparable with the size of red blood cells, and physiological levels of hematocrit. The simulation on Tsubame 2.0 exhibits excellent scalability up to 4000 GPUs and achieves close to 1 Petaflop aggregate performance, which demonstrates the capability to predicting the evolution of biofluidic phenomena of clinical significance. The combination of novel mathematical models, computational algorithms, hardware technology and optimization will be discussed together with an application employed to assess the vulnerability of the coronary network to atherosclerotic plaque build-up to assist clinical decision.  Back
Keywords:
Exascale, GTC Asia 2011 - ID 1046
 
James Fung
- NVIDIA
Learn what’s new has been added into OpenCV, the most well-known library for Computer Vision! The GPU module of the library keeps growing, its functionality allows executing quality computer vision algorithms faster than on the CPU, and sometimes ev ...Read More
Learn what’s new has been added into OpenCV, the most well-known library for Computer Vision! The GPU module of the library keeps growing, its functionality allows executing quality computer vision algorithms faster than on the CPU, and sometimes even real-time. The talk provides an overview of OpenCV GPU module functionality, and newly added algorithms are demonstrated along with CUDA implementation details.  Back
Keywords:
Exascale, GTC Asia 2011 - ID 1085
 
Chaofeng Hou
- Chinese Academy of Sciences, Institute of Process Engineering
An efficient and highly scalable bond-order potential (BOP) code has been developed for the large-scale molecular dynamics (MD) simulation of crystalline silicon. Using all 7168 GPUs on Tianhe-1A, the simulation of crystalline silicon using the Ters ...Read More
An efficient and highly scalable bond-order potential (BOP) code has been developed for the large-scale molecular dynamics (MD) simulation of crystalline silicon. Using all 7168 GPUs on Tianhe-1A, the simulation of crystalline silicon using the Tersoff potential reaches 1.87Pflops in single precision, which is perhaps the highest performance of MD simulation reported so far. Furthermore, by coupling 86016 CPU cores on Tianhe-1A, we achieved a sustainable performance of 1.17Pflops in single precision plus 92.1Tflops in double precision for the simulation of surface reconstruction of crystalline silicon involving 111.2 billion atoms and the length of sub-millimeter scale in one dimension.  Back
Keywords:
Exascale, GTC Asia 2011 - ID 1088
General Interest
Presentation
Media
Jen-Hsun Huang
- NVIDIA
Do not miss this opening keynote, featuring Jen-Hsun Huang, CEO and Co-Founder of NVIDIA and special guests. Hear about what’s next in gpu computing, and preview disruptive technologies and exciting demonstrations from across industries. Jen-Hsun H ...Read More
Do not miss this opening keynote, featuring Jen-Hsun Huang, CEO and Co-Founder of NVIDIA and special guests. Hear about what’s next in gpu computing, and preview disruptive technologies and exciting demonstrations from across industries. Jen-Hsun Huang co-founded NVIDIA in 1993 and has served since its inception as president, chief executive officer and a member of the board of directors.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 1010
Streaming:
Download:
 
Xiangfei Meng
- National Supercomputer Center in Tianjin
This plenary session will focus on some of the large-scale applications running on Tianhe 1A, the world's second fastest supercomputer. ...Read More
This plenary session will focus on some of the large-scale applications running on Tianhe 1A, the world's second fastest supercomputer.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 1020
 
The CUDA Student Workshop is a platform for college students to present their CUDA programming-enabled application results. The workshop will showcase the achievements of six college students, the winners of the CUDA contest. In addition, we’re hon ...Read More
The CUDA Student Workshop is a platform for college students to present their CUDA programming-enabled application results. The workshop will showcase the achievements of six college students, the winners of the CUDA contest. In addition, we’re honored to have seven well-known professors participate in the workshop as judges. These professors will provide commentary on the work of the students and hold discussions on how to effectively leverage the GPU in a variety of scientific research. The invited judges include: * Yangdong Deng, Tsinghua University * Yifeng Chen, Peking University * Youquan Liu, Chang'An University * Xinhua Lin, Shanghai Jiaotong University * Wei Ge, CAS, Institute of Process Engineering * Ying Liu, CAS, Graduate University * Hu Chen, South China University of Technology  Back
Keywords:
General Interest, GTC Asia 2011 - ID 1030
Download:
 
Jeff Herbst
- NVIDIA
The Emerging Companies Summit is a unique forum for startup companies to showcase innovative applications that leverage the GPU to solve visual and compute-intensive problems. The opening address includes an overview of NVIDIA's GPU ecosystem develo ...Read More
The Emerging Companies Summit is a unique forum for startup companies to showcase innovative applications that leverage the GPU to solve visual and compute-intensive problems. The opening address includes an overview of NVIDIA's GPU ecosystem development activities and an interaction on stage with selected companies building groundbreaking applications on top of the GPU platform. The ECS is a great opportunity to discover new players in the GPU ecosystem, find great investments, explore partnerships and customer/vendor opportunities, network/build relationships, and discuss the future of an industry that is reshaping computing.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 1050
 
Wesley Kuo
- Ubitus
See the hottest new technologies from startups that are transforming computing. The Emerging Companies Summit “CEO on Stage” is a lively and fast-paced program that provides invited CEOs an opportunity to present their companies, products and strate ...Read More
See the hottest new technologies from startups that are transforming computing. The Emerging Companies Summit “CEO on Stage” is a lively and fast-paced program that provides invited CEOs an opportunity to present their companies, products and strategies to a panel of investors, analysts and technology leaders, who in turn will provide insightful feedback.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 1052
 
Gong Yu
- Qiyi
See the hottest new technologies from startups that are transforming computing. The Emerging Companies Summit “CEO on Stage” is a lively and fast-paced program that provides invited CEOs an opportunity to present their companies, products and strate ...Read More
See the hottest new technologies from startups that are transforming computing. The Emerging Companies Summit “CEO on Stage” is a lively and fast-paced program that provides invited CEOs an opportunity to present their companies, products and strategies to a panel of investors, analysts and technology leaders, who in turn will provide insightful feedback.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 1054
 
Ping Fu
- Geomagic
See the hottest new technologies from startups that are transforming computing. The Emerging Companies Summit “CEO on Stage” is a lively and fast-paced program that provides invited CEOs an opportunity to present their companies, products and strate ...Read More
See the hottest new technologies from startups that are transforming computing. The Emerging Companies Summit “CEO on Stage” is a lively and fast-paced program that provides invited CEOs an opportunity to present their companies, products and strategies to a panel of investors, analysts and technology leaders, who in turn will provide insightful feedback.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 1055
 
Mark Popkiewicz
- MirriAd
See the hottest new technologies from startups that are transforming computing. The Emerging Companies Summit “CEO on Stage” is a lively and fast-paced program that provides invited CEOs an opportunity to present their companies, products and strate ...Read More
See the hottest new technologies from startups that are transforming computing. The Emerging Companies Summit “CEO on Stage” is a lively and fast-paced program that provides invited CEOs an opportunity to present their companies, products and strategies to a panel of investors, analysts and technology leaders, who in turn will provide insightful feedback.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 1056
 
Jeff Herbst
- NVIDIA
Keywords:
General Interest, GTC Asia 2011 - ID 1057
 
Ren Wu
- HP Labs
GPUs have been used in many different domains with great success. In this talk I will share some of the work we did at HP Labs on using GPUs as accelerator for large scale deep analytics - a relatively less explored area but full of potential. Our r ...Read More
GPUs have been used in many different domains with great success. In this talk I will share some of the work we did at HP Labs on using GPUs as accelerator for large scale deep analytics - a relatively less explored area but full of potential. Our results show that GPUs can bring tremendous performance advantages over CPU only approaches.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 1074
 
Woon-Yung Chung
- Hewlett Packard Company
New High Performance Computing (HPC) customer requirements demand a shift in technology and market innovation. HP’s Converged Infrastructure strategy combined with GPU’s, a disruptive technology, effectively address HPC customer’s thirst for Speed a ...Read More
New High Performance Computing (HPC) customer requirements demand a shift in technology and market innovation. HP’s Converged Infrastructure strategy combined with GPU’s, a disruptive technology, effectively address HPC customer’s thirst for Speed and Performance at lower power, cooling and foot print costs. Examples of customer and partner uses of GPUs will be described along with HP’s extensive line of GPU enabled servers, purpose built to deliver innovation at any scale  Back
Keywords:
General Interest, GTC Asia 2011 - ID 1075
 
YangDong Steve Deng
- Tsinghua University, Institute of Microelectronics
With the fast increasing complexity of integrated circuits, logic simulation based design verification has become the bottleneck of today’s design flows of integrated circuits (ICs). This session reviews our recent work on using GPU to accelerate lo ...Read More
With the fast increasing complexity of integrated circuits, logic simulation based design verification has become the bottleneck of today’s design flows of integrated circuits (ICs). This session reviews our recent work on using GPU to accelerate logic simulation at both register-transfer level (RTL) and gate level. Our simulation framework is based on a distributed data structure to manage simulation events. A dynamic GPU memory allocator is also introduced to efficiently manage GPU memory resources. The operation of GPU during simulation is orchestrated by an asynchronous parallel simulation protocol for sufficient parallelism. In addition, RTL simulation is performed in a compiled-code scheme by translating the input hardware description language (e.g., Verilog) into equivalent CUDA code. Experimental results show that the GPU simulators significantly outperform their CPU counterparts. This work proves the potential of modern GPUs to revolutionize the landscape of IC verification.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 1084
Streaming:
Download:
 
Ting-Wai Chiu
- National Taiwan University
To understand the nature of the strong interaction in the subatomic regime is a grand challenge in science. Now we know that the fundamental theory for the strong interaction is Quantum Chromodynamics (QCD). However, starting from the action of QCD, ...Read More
To understand the nature of the strong interaction in the subatomic regime is a grand challenge in science. Now we know that the fundamental theory for the strong interaction is Quantum Chromodynamics (QCD). However, starting from the action of QCD, it is very computationally demanding to extract physical observables in QCD, which always requires the state-of-the-art supercomputers. In this talk, I outline the salient features of QCD which are relevant to HPC, and explain how GPU can serve as a vital device for large-scale QCD simulations. Moreover, I present a new strategy to compute QCD nonperturbatively from the first principles, which can be improved systematically, and will eventaully lead to high precision predictions from the first principles of QCD.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 1086
 
Wil Braithwaite
- NVIDIA
By leveraging NVIDIA's new Maximus configuration, we demonstrate realtime fluid simulation using Smoothed Particle Hydrodynamics. We will present a live demo plug-in inside Autodesk's Maya application, and discuss the implementation. ...Read More
By leveraging NVIDIA's new Maximus configuration, we demonstrate realtime fluid simulation using Smoothed Particle Hydrodynamics. We will present a live demo plug-in inside Autodesk's Maya application, and discuss the implementation.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 2063
 
Peter Lu
- Harvard University
Many applications of GPGPU marshal hundreds of GPUs in large computer clusters, enabling large simulations where the ratio of calculation to actual data is large. In the laboratory, however, many experimental applications have large quantities of da ...Read More
Many applications of GPGPU marshal hundreds of GPUs in large computer clusters, enabling large simulations where the ratio of calculation to actual data is large. In the laboratory, however, many experimental applications have large quantities of data; so large, in fact, that moving the data to a remote cluster may take longer than analyzing it once it arrives. The ability to bring supercomputing to the data, in form of NVIDIA GPUs, however, allows high-throughput analysis where results can be obtained fast enough to guide subsequent experiments. This interactive feedback loop can improve the quality of the science itself. I discuss a number of applications where we have applied GPGPU techiques with CUDA to analyzing images and data in the laboratory, including swimming bacteria, diffusing colloids, and optical tomography, and phase-separating liquid-gas colloidal mixtures onboard the International Space Station.   Back
Keywords:
General Interest, GTC Asia 2011 - ID 2080
 
Francois Bodin
- CAPS Entreprise
Pushed by the pace of innovation in the GPU architecture and more generally the manycore technology, the processor landscape is moving fast. This fast evolution makes software development more complex. Furthermore, the impact of the programming styl ...Read More
Pushed by the pace of innovation in the GPU architecture and more generally the manycore technology, the processor landscape is moving fast. This fast evolution makes software development more complex. Furthermore, the impact of the programming style on future performance and portability of the application is difficult to forecast. The use of directives to annotate serial languages (e.g. C/C++/Fortran) looks very promising. They abstract the programming of low-level parallelism details while preserving code assets against the evolution of processor architectures. In this presentation, we describe how to use the HMPP (Heterogeneous Manycore Parallel Programming) API, one of the directive-based approaches, to program heterogeneous compute nodes. In particular, we provide insights on how GPU / CPU can be exploited in a unified manner and how code tuning issues can be minimized. We extend the discussion to the use of libraries that is currently one of the key elements when addressing GPU and manycores.   Back
Keywords:
General Interest, GTC Asia 2011 - ID 2081
 
Tau Leng
- Supermicro
Optimized GPU Server Designs Maximize Performance and Efficiency for HPC Applications. Today’s High Performance Computing solutions are under ever increasing demands from scientific, engineering and business applications to run 24x7x365 at peak perf ...Read More
Optimized GPU Server Designs Maximize Performance and Efficiency for HPC Applications. Today’s High Performance Computing solutions are under ever increasing demands from scientific, engineering and business applications to run 24x7x365 at peak performance processing massive amounts of data across distributed networks. As such, there are many facets of hardware design to consider including CPU, memory, interconnect, density, power efficiency, reliability as well as performance accelerator to maximize performance per watt, per square foot and per dollar. With NVIDIA GPU technology, HPC scales to new levels of performance within existing infrastructure and budgetary constraints. Experienced system integrators will find flexibility and the right balance between design, efficiency and performance with Supermicro’s open standards based HPC Building Block Solutions®.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 2110
Streaming:
Download:
 
David Chen
- MathWorks China
With the development of science and technology, more and more R&D and Engineering application departments are facing challenges with large scale technical numeric computing. The technique for parallel computing with MATLAB has made great progress no ...Read More
With the development of science and technology, more and more R&D and Engineering application departments are facing challenges with large scale technical numeric computing. The technique for parallel computing with MATLAB has made great progress not only in the development of parallel algorithm but also in the hardware support. The session will cover an introduction and new features with MATLAB Parallel Computing Toolbox as well as information on parallelizing your MATLAB algorithm on GPUs.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 2111
 
Dick Bland
- Hewlett-Packard Company
HP and NVIDIA has several collaborate partnership to open new world for GPU computing. Several of these partnerships is discussed including strategic partnering Workstation project, and CUDA Research Center. Several customer cases will be provided a ...Read More
HP and NVIDIA has several collaborate partnership to open new world for GPU computing. Several of these partnerships is discussed including strategic partnering Workstation project, and CUDA Research Center. Several customer cases will be provided as GPU computing platform, ranging from Digital Forensics, BIM, Medical Imaging, CAE, and Medical Imaging. Then I will describe the structural advantage of using HP Z800 workstations for GPU computing. From May, also new mobile workstation enables Personal Mobile Supercomputing.   Back
Keywords:
General Interest, GTC Asia 2011 - ID 2112
 
Kelvin Wang
- Autodesk China R&D Center (ACRD)
As a leader in 3D design, engineering and entertainment software, Autodesk is a forerunner of cloud service in the design & engineering industry. After announcing Autodesk Cloud globally in September, Autodesk launched CADren, a web portal and desig ...Read More
As a leader in 3D design, engineering and entertainment software, Autodesk is a forerunner of cloud service in the design & engineering industry. After announcing Autodesk Cloud globally in September, Autodesk launched CADren, a web portal and design platform built for—and by—the China design community to meet the unique demands of the Chinese market. It offers new opportunities for collaboration among Chinese designers, engineers, and other CAD users. In addition, the online store represents a new business model and opportunities for developers and partners.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 2113
 
Zhou Kai
- Lenovo
Keywords:
General Interest, GTC Asia 2011 - ID 2114
 
Zhihong Wen
- Dell
Keywords:
General Interest, GTC Asia 2011 - ID 2115
 
Andrew Lynn
- ASUSTek Computer Inc
Not all researchers and scientists get the opportunity to build a PetaFLOPS scale general purpose HPC in their own lab. However, they may have access to a million dollar budget to spend on building a specific-purpose GPU supercomputer for their orga ...Read More
Not all researchers and scientists get the opportunity to build a PetaFLOPS scale general purpose HPC in their own lab. However, they may have access to a million dollar budget to spend on building a specific-purpose GPU supercomputer for their organization. This session aims to share experiences gathered from building a powerful HPC capable of 70 TFLOPS operation, as well as knowledge relating to life science applications in China.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 2116
 
Rong Dai
- Sugon
Keywords:
General Interest, GTC Asia 2011 - ID 2117
 
Zhang Qing
- Inspur
The session examines collaboration between Inspur and BGP in oil seismic data processing prestack time migration and the three-dimensional random noise attenuation algorithm. The session analysises application characteristics and combined GPU comput ...Read More
The session examines collaboration between Inspur and BGP in oil seismic data processing prestack time migration and the three-dimensional random noise attenuation algorithm. The session analysises application characteristics and combined GPU computing features focusing on migration optimization methods, which targets the applications to optimize them, which has led to substantial performance improvement.  Back
Keywords:
General Interest, GTC Asia 2011 - ID 2118
Life Sciences
Presentation
Media
Xiaoguang Liu
- Nankai University
MrBayesian is a popular software for phylogenetic inference to propse a “tree of life” for a collection of species whose DNA sequences are known. Parallelized versions of Metropolis coupled Markov chain Monte Carlo (MC3) algorithm in MrBayes has bee ...Read More
MrBayesian is a popular software for phylogenetic inference to propse a “tree of life” for a collection of species whose DNA sequences are known. Parallelized versions of Metropolis coupled Markov chain Monte Carlo (MC3) algorithm in MrBayes has been presented that can run on various platforms. We give an appraisal of implementing MrBayes MC3 in parallel with GPUs. We can achieve a speedup (vs serial MrBayes MC3) of more than 20x on a suffiently large dataset using a single GPU, and nearly linear speedup on a GPU cluster.  Back
Keywords:
Life Sciences, GTC Asia 2011 - ID 1070
 
Agatha Hu
- NVIDIA
The session will go over in detail a GPU-accelerated application in bioinformatics. The talk consists of two sections. In the first section, we will introduce a GPU accelerated exhaustive SNP-SNP interaction model. This method is highly parallel ...Read More
The session will go over in detail a GPU-accelerated application in bioinformatics. The talk consists of two sections. In the first section, we will introduce a GPU accelerated exhaustive SNP-SNP interaction model. This method is highly parallel and can easily be implemented on CUDA capable hardware. With good optimizations, we can achieve incredible speedups. This talk will also review how to mix cross-platform data for interaction studies. For the second part, we will provide a detailed overview about the Hidden Markov Model based solution, which introduces a prototype of GPU-based imputation tool.  Back
Keywords:
Life Sciences, GTC Asia 2011 - ID 1071
 
Ryota Koga
- X-Ability Co., Ltd.
Computational quantum chemistry mehods such as the Hartree-Fock (HF), the density functional theory (DFT) or the fragment molecular orbital (FMO) require heavy computational resources. In this study they are accelerated by using graphics processing ...Read More
Computational quantum chemistry mehods such as the Hartree-Fock (HF), the density functional theory (DFT) or the fragment molecular orbital (FMO) require heavy computational resources. In this study they are accelerated by using graphics processing units (GPUs) and the vector instruction set (AVX) of latest CPU. PRISM algorithm to evaluate the electron repulsion integrals was vectorized to utilize AVX as much as possible. We found that this new program makes the Fock matrix formation in HF 2 to 3 times faster than ever before (multi-CPUs). The Coulomb and the exchange-correlation potentials in DFT were evalualted on GPU, result in about 4 times overall speedup. The programs developed were used to accrelerate FMO. We found that our new algorithm and GPU are very suitable for the calculation of the environmental electrostatic potential. The total computational time was reduced to about 1/3. The combination of ER (high performance free energy calculation method) and FMO based MD will be introduced so that the evaluation of affinity will be available.   Back
Keywords:
Life Sciences, GTC Asia 2011 - ID 1072
 
Ying Ren
- Chinese Academy of Sciences, Institute of Process Engineering
GPUs provide unprecedented computational power for large-scale scientific applications and provide an opportunity to accelerate traditional CPU-based MD simulations. In this talk, we will describe how to develop molecular simulation software for sim ...Read More
GPUs provide unprecedented computational power for large-scale scientific applications and provide an opportunity to accelerate traditional CPU-based MD simulations. In this talk, we will describe how to develop molecular simulation software for simulating macromolecules with high efficiency based on the hardware of our GPU-based high-performance computer Mole-8.5. And then apply this software to probe the whole influenza virion at atomic level, and study protein folding from both dynamic and thermodynamic point of view. Further potentials of this kind of computations in bio-systems will be discussed.  Back
Keywords:
Life Sciences, GTC Asia 2011 - ID 1073
 
BingQiang Wang
- BGI
After digitizing DNA double helix by sequencing, lots of computational research now becomes practical, for discovering the secret of life. As massive data is generate, how to process and analysis as well as storage them efficiently turns out to be a ...Read More
After digitizing DNA double helix by sequencing, lots of computational research now becomes practical, for discovering the secret of life. As massive data is generate, how to process and analysis as well as storage them efficiently turns out to be a major challenge. Integrating existing open source GPU accelerated bioinformatics tools, BGI researchers are able to run their analysis pipelines with less cost and high throughput. At the same time, several essential tools are developed, including alignment, variation detection. The speed up is generally around 10-50x comparing with traditional counterparts.  Back
Keywords:
Life Sciences, GTC Asia 2011 - ID 2070
 
Mian Lu
- Hong Kong University of Science and Technology
We report two high performance GPU-accelerated genomics data analysis tools: GSNP and GAMA. GSNP is used to detect DNA variation on a single nucleotide. The high performance is achieved through optimized data structures to reduce the memory overhead ...Read More
We report two high performance GPU-accelerated genomics data analysis tools: GSNP and GAMA. GSNP is used to detect DNA variation on a single nucleotide. The high performance is achieved through optimized data structures to reduce the memory overhead, and effective GPU resource utilization, e.g., the shared memory. GAMA is to compute DNA minor allele frequency for a population. We optimize the original iterative-update algorithm into a nested-loop based algorithm, which can match the data parallel GPU architecture better. As a result, compared with optimized single-threaded CPU counterparts, GSNP and GAMA can achieve speedups of up to around 50 and 47 times, respectively.  Back
Keywords:
Life Sciences, GTC Asia 2011 - ID 2071
 
Weiguo Liu
- Nanyang Technological University
The enormous growth of biological sequence data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing rapidly as well. Th ...Read More
The enormous growth of biological sequence data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing rapidly as well. The recent emergence of parallel accelerator technologies such as GPUs has made it possible to significantly reduce the execution times of many bioinformatics applications. In this talk I will present the design and implementation of scalable GPU algorithms based on the CUDA programming model in order to accelerate important bioinformatics applications. In particular, I will focus on CUDA-accelerated BLASTP algorithm.  Back
Keywords:
Life Sciences, GTC Asia 2011 - ID 2072
 
Xiaoquan Su
- Chinese Academy of Sciences, Qingdao Institute of Bioenergy and Bioprocess Technology
Metagenomics method directly sequences and analyzes genome information from microbial communities. There are usually more than hundreds of genomes from different microbial species in the same community, and the main computational tasks for metagenom ...Read More
Metagenomics method directly sequences and analyzes genome information from microbial communities. There are usually more than hundreds of genomes from different microbial species in the same community, and the main computational tasks for metagenomics data analysis include taxonomical and functional component of these genomes in the microbial community. Metagenomic data analysis is both data- and computation- intensive, which requires extensive computational power. Therefore, advanced computational methods and pipelines have to be developed to cope with such need for efficient analyses. In this work, we proposed Parallel-META, a GPU- and multi-core-CPU-based open-source pipeline for metagenomic data analysis, which enabled the efficient and parallel analysis of multiple metagenomic datasets. In Parallel-META, the similarity-based database search was parallelized based on GPU computing and multi-core CPU computing optimization. Experiments have shown that Parallel-META has at least 15 times speed-up compared to traditional metagenomic data analysis method, with the same accuracy of the results  Back
Keywords:
Life Sciences, GTC Asia 2011 - ID 2073
 
Yutaka Akiyama
- Tokyo Institute of Technology
We have developed a fully automated computing pipeline for metagenome analysis that can deal with huge data from a next generation sequencer in realistic time on Tsubame 2.0 supercomputer. In our pipeline, two different sequence homology search tool ...Read More
We have developed a fully automated computing pipeline for metagenome analysis that can deal with huge data from a next generation sequencer in realistic time on Tsubame 2.0 supercomputer. In our pipeline, two different sequence homology search tools can be selected; 1) BLASTX, standard homology search software used in many metagenomic researches. 2) GHOSTM, our original GPU-based fast homology search software based on CUDA. GHOSTM shows much higher search sensitivity than BLAT and it is enough for metagenome analysis. On this pipeline, we performed a metagenome analysis for 71 million Solexa read taken from polluted soils. As results, the pipeline shows almost linear speedup to the number of computing cores. When we use BLASTX as a homology search program, the pipeline achieves to process about 24 million reads per hour with 16008 CPU cores (1334 nodes) . When we use GHOSTM as a homology search program, the pipeline achieves to process about 60 million reads per hour with 2520 GPUs (840 nodes). These results indicate the pipeline can process genome information obtained from a single run of next generation sequencers in a few hours, even with our sensitive homology search protocol based on 6-frame amino acid sequence comparison with nr-aa database.  Back
Keywords:
Life Sciences, GTC Asia 2011 - ID 2074
 
Jun Zhu
- Institute of Bioinformatics at Zheijiang University
Most import agricultural traits and human diseases are complex traits which are controlled by gene networks with gene by gene interaction (epistasis) and gene environment interaction (GE). New statistical methods and software are developed for analy ...Read More
Most import agricultural traits and human diseases are complex traits which are controlled by gene networks with gene by gene interaction (epistasis) and gene environment interaction (GE). New statistical methods and software are developed for analyzing genetic architecture for complex traits based on the genome-wide association study (GWAS). When dealing with a large mapping population and huge amounts of molecular information, GPU computation has an advantage over CPU computation. We will demonstrate the newly developed GPU-based software, QTLNetwork V3.0 and GWAS-GMDR, for mapping genes with epistasis and GE interation from complex traits in humans, crops and mice.   Back
Keywords:
Life Sciences, GTC Asia 2011 - ID 2075
 
James Lin
- Shanghai Jiaotong University
Bayesian Network has been used by molecular biologist to detect dependency relationship of gene expression. Collected by DNA microarray, massive data of gene expression need a faster learning algorithm. In this talk, we present design and further op ...Read More
Bayesian Network has been used by molecular biologist to detect dependency relationship of gene expression. Collected by DNA microarray, massive data of gene expression need a faster learning algorithm. In this talk, we present design and further optimization of CUDA-based “Sparse Candidate” algorithm for learning in a Bayesian Network. Tested with data sets from industry company, it has achieved about 34-fold speedup on a quad-M2050 GPU server node.  Back
Keywords:
Life Sciences, GTC Asia 2011 - ID 2076
Parallel Programming Languages & Compilers
Presentation
Media
Michael Wolfe
- The Portland Group
Great performance gains can be achieved by programmers who take advantage of the unique architectural advantages of GPUs. Some part of the work requires programmer creativity while other parts are purely mechanical and suitable for automation using ...Read More
Great performance gains can be achieved by programmers who take advantage of the unique architectural advantages of GPUs. Some part of the work requires programmer creativity while other parts are purely mechanical and suitable for automation using software tools. In this talk, we compare high-level GPU programming using the PGI Accelerator programming model directives to low-level GPU programming using CUDA or OpenCL, exploring four aspEmerging Companies Summit. We discuss the creative commonalities, such as algorithm and data structure design, that are similar regardless of the programming model. We look at programming effort, the cost of writing the program, and how much training is required to achieve good performance. We present actual delivered performance using both high-level and low-level programming. Finally, we look at how much effort is required when porting a program to the next generation of GPU.  Back
Keywords:
Parallel Programming Languages & Compilers, GTC Asia 2011 - ID 1063
Programming Languages & Techniques
Presentation
Media
Cliff Woolley
- NVIDIA
Learn everything you need to know in order to start programming in CUDA C, starting from a background in C or C++. Beginning with a "Hello, World" CUDA C program, we will explore parallel programming with CUDA through a number of hands-on code examp ...Read More
Learn everything you need to know in order to start programming in CUDA C, starting from a background in C or C++. Beginning with a "Hello, World" CUDA C program, we will explore parallel programming with CUDA through a number of hands-on code examples, examine more deeply the various APIs available to CUDA applications and learn the best (and worst) ways to employ them in applications.  Back
Keywords:
Programming Languages & Techniques, GTC Asia 2011 - ID 1060
 
Peng Wang
- NVIDIA
This session will discuss basic parallel patterns and primitives in data parallel programming. After a brief survey on the various primitives and their implementations, we will discuss how to solve parallel programming problems quickly and efficient ...Read More
This session will discuss basic parallel patterns and primitives in data parallel programming. After a brief survey on the various primitives and their implementations, we will discuss how to solve parallel programming problems quickly and efficiently by realizing their underlying patterns. Realistic examples will be used to illustrate the technique including radix sort, cell-list build in molecular dynamics and MPI geometric computation in particle code.  Back
Keywords:
Programming Languages & Techniques, GTC Asia 2011 - ID 2062
Tools & Libraries
Presentation
Media
Sanjiv Satoor
- NVIDIA
CUDA-GDB is the NVIDIA debugger on Linux and Mac platforms. Working seamlessly with both the host code and the device code, CUDA-GDB allows you to inspect the GPU memory, the GPU registers, and the source variables of your application. By using brea ...Read More
CUDA-GDB is the NVIDIA debugger on Linux and Mac platforms. Working seamlessly with both the host code and the device code, CUDA-GDB allows you to inspect the GPU memory, the GPU registers, and the source variables of your application. By using breakpoints, conditional and unconditional, the debugger can quickly converge to the program phase of interest. This session will use CUDA-GDB to analyze and debug a sample CUDA application.  Back
Keywords:
Tools & Libraries, GTC Asia 2011 - ID 1062
 
Yossi Levanoni
- Microsoft
Microsoft has recently announced C++ AMP (Accelerated Massive Parallelism), which is comprised of a C++ programming model, C++ language support, and developer tools---all of which are used for expressing data parallelism in C++. C++ AMP will be rele ...Read More
Microsoft has recently announced C++ AMP (Accelerated Massive Parallelism), which is comprised of a C++ programming model, C++ language support, and developer tools---all of which are used for expressing data parallelism in C++. C++ AMP will be released in the next edition of Visual Studio and is currently available for experimentation through the Visual Studio 11 Developer Preview.  Back
Keywords:
Tools & Libraries, GTC Asia 2011 - ID 1080
Streaming:
Download:
 
Xuan Wang
- NVIDIA
NVIDIA Parallel Nsight provides access to the power of the GPU from within the familiar environment of Microsoft Visual Studio. In this session, you will learn how to use Parallel Nsight to develop GPU computing. Learn how to use the powerful Parall ...Read More
NVIDIA Parallel Nsight provides access to the power of the GPU from within the familiar environment of Microsoft Visual Studio. In this session, you will learn how to use Parallel Nsight to develop GPU computing. Learn how to use the powerful Parallel Nsight debugger to identify errors in CUDA C/C++ kernels using GPU breakpoints and direct memory and variable inspection. See how Parallel Nsight displays system-wide performance characteristics, allowing you to create efficient GPU algorithms.  Back
Keywords:
Tools & Libraries, GTC Asia 2011 - ID 1081
 
Peng Wang
- NVIDIA
This session will discuss how to do performance optimization by an analysis-driven process. Three fundamental limiters to kernel performance will be discussed including instruction throughput, memory throughput, and latency. In this session we will ...Read More
This session will discuss how to do performance optimization by an analysis-driven process. Three fundamental limiters to kernel performance will be discussed including instruction throughput, memory throughput, and latency. In this session we will describe: How to use profiling tools and source code instrumentation to assess the significance of performance limiters; What optimizations to apply for each limiter; How to determine when hardware limits are reached. Concepts will be illustrated with some examples and are equally applicable to both CUDA and OpenCL development. It is assumed that attendees are already familiar with the fundamental optimization techniques.  Back
Keywords:
Tools & Libraries, GTC Asia 2011 - ID 1082
Download:
 
Peng Wang
- NVIDIA
This session will discuss how to perform the fundamental optimization of the CUDA kernel codes. In this session, we will discuss the following topics: Kernel launch configuration; global memory throughput; shared memory access; instruction throughpu ...Read More
This session will discuss how to perform the fundamental optimization of the CUDA kernel codes. In this session, we will discuss the following topics: Kernel launch configuration; global memory throughput; shared memory access; instruction throughput and control flow; PCI-E throughput and overlapping kernel execution with memory copies. The attendees are assumed to be familiar with basic CUDA concepts and GPU architecture.  Back
Keywords:
Tools & Libraries, GTC Asia 2011 - ID 1083
 
Sanjiv Satoor
- NVIDIA
The NVIDIA Visual Profiler helps you optimize your CUDA application to get maximum performance. Completely updated for 4.1, the Visual Profiler provides an integrated timeline that allows you to visualize your application's behavior on both the CPU ...Read More
The NVIDIA Visual Profiler helps you optimize your CUDA application to get maximum performance. Completely updated for 4.1, the Visual Profiler provides an integrated timeline that allows you to visualize your application's behavior on both the CPU and GPU. Using the timeline and data collected from GPU performance counters, the profiler will analyze your application to identify bottlenecks and provide optimization suggestions that you can use to improve performance. This session will use the NVIDIA Visual Profiler to analyze and optimize the performance of a sample CUDA application.  Back
Keywords:
Tools & Libraries, GTC Asia 2011 - ID 2061

 
 
 
GTC SPONSORS
AdobeDellLas AlamosLenovo