SEARCH SESSIONS

Search All
 
Refine Results:
All tags
All Events
 
All Years
All Types

EMAIL SUBSCRIPTION

SOCIAL MEDIA

MOBILE APPS

 
 

GTC On-Demand

GTC On-Demand Featured Talks

GPU computing is a transformational force in high performance computing and is enabling developers, engineers, programmers and researchers across a myriad of industry verticals, as well as academia to accelerate research and mission critical applications. See our featured sessions highlighting some of our best talks or delve head-long into the many other keynotes, technical sessions, presentations, research posters, webinars and tutorials we make available to you at any time on GTC On-Demand.

Acoustics & Audio Processing
Presentation
Media
Jungsuk Kim (Carnegie Mellon Silicon Valley)
HYDRA, a real-time LVCSR (Large Vocabulary Speech Recognition) engine that performs decoding on CPU, GPU or hybrid CPU/GPU platforms is presented in this talk. While prior works have demonstrated the effectiveness of manycore graphic processing ...Read More

HYDRA, a real-time LVCSR (Large Vocabulary Speech Recognition) engine that performs decoding on CPU, GPU or hybrid CPU/GPU platforms is presented in this talk. While prior works have demonstrated the effectiveness of manycore graphic processing units (GPU) for high-throughput, limited vocabulary speech recognition, they are unsuitable for recognition with large acoustic and language models due to the limited memory. To overcome this limitation, we have developed a novel architecture for speech recognition decoding that jointly leverages manycore graphic processing units (GPU) and multicore processors (CPU) to perform speech recognition even when large acoustic and language models are applied. The proposed architecture can perform speech recognition at up to 5x faster than real-time with a recognition vocabulary of more than 1 Million words.

  Back
 
Keywords:
Acoustics & Audio Processing, GTC 2013 - ID S3406
Streaming:
Download:
Advanced Driver Assistance Systems (ADAS)
Presentation
Media
Victor Eruhimov (Itseez, Inc.)
There is a growing need for fast and power-efficient computer vision on embedded devices. This session will focus on computer vision capabilities on embedded platforms available to ADAS developers, covering OpenCV CUDA implementation and the new ...Read More

There is a growing need for fast and power-efficient computer vision on embedded devices. This session will focus on computer vision capabilities on embedded platforms available to ADAS developers, covering OpenCV CUDA implementation and the new computer vision standard OpenVX. In addition, Itseez traffic sign detection will be showcased. The algorithm is capable of detecting speed limit signs for both North America and EMEA regions as well as several other signs, delivering faster than real-time performance on an embedded platform with a mobile grade GPU.

  Back
 
Keywords:
Advanced Driver Assistance Systems (ADAS), Automotive, Computer Vision, GTC 2013 - ID S3548
Streaming:
Download:
 
Stefaan Sonck Thiebaut (OpenSynergy)
This talk will introduce the main challenges in the next generation of automotive infotainment applications: OEMs want to take advantage of open source solutions like Linux and Android yet have very high requirements on safety, security and boot ...Read More

This talk will introduce the main challenges in the next generation of automotive infotainment applications: OEMs want to take advantage of open source solutions like Linux and Android yet have very high requirements on safety, security and boot-times. In addition, to reduce costs, more functionality needs to be integrated on a single processor. An example of this is the integration of the head-unit and the instrument cluster as two displays of a single device. As a solution to these requirements, we describe a software architecture that uses virtualization with a micro-kernel and that is already implemented and available on NVIDIA Tegra3. We will give a brief outlook on the next steps regarding the sharing of the GPU and hardware virtualization.

  Back
 
Keywords:
Advanced Driver Assistance Systems (ADAS), Automotive, In-Vehicle Infotainment (IVI), Instrument Clusters & Heads-Up Display (HUD), GTC 2013 - ID S3577
Streaming:
Download:
Algorithms & Numerical Techniques
Presentation
Media
Joppe Bos (Microsoft Research)
Cryptology is the field of research consisting of cryptography, the science of hiding information, and cryptanalysis, often referred to as the practice of code breaking. In this talk we explain how the GPU can be used to enhance the performance ...Read More

Cryptology is the field of research consisting of cryptography, the science of hiding information, and cryptanalysis, often referred to as the practice of code breaking. In this talk we explain how the GPU can be used to enhance the performance of cryptology by presenting techniques how to speed up the underlying arithmetic on parallel architectures. On the one hand GPUs can be used as cryptographic accelerators: by offloading cryptographic operations to these parallel devices a much higher throughput rate can be achieved compared to conventional solutions on CPUs. We explain the obstacles and potential benefits when using GPUs for cryptography in environments like cloud computing. On the other hand, GPUs can be used for security assessment by enhancing the performance of the state-of-the-art cryptanalytic algorithms: this gives a better insight what cryptographic key-sizes should be considered (in)secure in practice.

  Back
 
Keywords:
Algorithms & Numerical Techniques, GTC 2013 - ID S3018
Streaming:
Download:
 
Ian Wainwright (High Performance Consulting)
The goal of this session is to show various optimization techniques and their actual performance benefits for matrices of roughly one warp''s width in size along one dimension. Where as a very large matrix can be mapped to one multi-bloc ...Read More

The goal of this session is to show various optimization techniques and their actual performance benefits for matrices of roughly one warp''s width in size along one dimension. Where as a very large matrix can be mapped to one multi-block kernel, and very small matrices can be mapped to one thread, matrices which lie in the space in-between, such as 32x32, require different mapping techniques. We will look at the performance benefits of warp-width-mapping using warp-shuffle, mapping multiple matrices to one warp, benefits of preferring L1 cache instead of shared memory, and aggressing loop unrolling.

  Back
 
Keywords:
Algorithms & Numerical Techniques, GTC 2013 - ID S3069
Streaming:
Download:
 
Ed Karrels (Santa Clara University)
In 1995, Bailey, Borwein and Plouffe discovered a new formula for computing pi that ignited a computation arms race by making it possible to compute digits of pi without storing previous digits, and without the use of large-number arithmetic. In ...Read More

In 1995, Bailey, Borwein and Plouffe discovered a new formula for computing pi that ignited a computation arms race by making it possible to compute digits of pi without storing previous digits, and without the use of large-number arithmetic. In 2010 Yahoo! set a world record, using a variant of the Bailey-Borwein-Plouffe formula on an 8000-core Hadoop cluster to compute the two quadrillionth bit of pi. In this talk, I''ll discuss how I stole the record from Yahoo by computing the four quadrillionth bit of pi on a single CUDA-enabled computer.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Clusters & GPU Management, GTC 2013 - ID S3071
Streaming:
Download:
 
Massimo Bernaschi (National Research Council of Italy)
Graphs with billions of edges do not fit within the device memory of a single GPU. So to explore large graphs, it is necessary to resort to multiple GPUs. Besides the techniques required to improve the load balancing among threads, it is necessa ...Read More

Graphs with billions of edges do not fit within the device memory of a single GPU. So to explore large graphs, it is necessary to resort to multiple GPUs. Besides the techniques required to improve the load balancing among threads, it is necessary to reduce the communication overhead among GPUs. To that purpose we resort to a pruning procedure to eliminate redundant data and to a new interconnection technology, called APEnet, that is the first, non-NVIDIA, device to exploit the possibilities offered by the GPUdirect technology. Our results show that APEnet performs better than Infiniband and may become a viable alternative for the connectivity of future GPU clusters.

  Back
 
Keywords:
Algorithms & Numerical Techniques, GTC 2013 - ID S3089
Streaming:
Download:
 
Gerhard Zumbusch (Friedrich-Schiller Universitat Jena)
Learn how to re-design memory bandwidth limited numerical algorithms like Finite Differences by using (1) different cache aware algorithms, (2) different vectorization strategies, (3) different memory layouts, (4) larger numbers of registers, an ...Read More

Learn how to re-design memory bandwidth limited numerical algorithms like Finite Differences by using (1) different cache aware algorithms, (2) different vectorization strategies, (3) different memory layouts, (4) larger numbers of registers, and (5) automatic parameter tuning. An optimal choice depends on the size and shape of the difference stencils and on the GPU type. Examples include one to three dimensional grids, lower to higher order stencils and a comparison to CPU code tuning.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Development Tools & Libraries, GTC 2013 - ID S3096
Streaming:
Download:
 
Zhen Shen (Institute of Automation, Chinese Academy of Sciences)
Explore new techniques on how to build a Multi-Agent System model of the transportation system and perform simulation and optimization based on GPU and GPU clusters. Vehicles, lanes, intersections, and the road networks are "mapped" to ...Read More

Explore new techniques on how to build a Multi-Agent System model of the transportation system and perform simulation and optimization based on GPU and GPU clusters. Vehicles, lanes, intersections, and the road networks are "mapped" to threads, blocks, rows of blocks and grids for the GPU to compute. In this way, a so-called micro-simulation system is obtained as the every detail can be simulated. Further, the Genetic Algorithms and a large scale system optimization method called "Ordinal Optimization" are employed to solve the traffic signal coordination problem. A much larger scale problem can be solved compared with the CPU approach.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Supercomputing, GTC 2013 - ID S3120
Streaming:
Download:
 
Angelo Liseno (Universita di Napoli Federico II), Amedeo Capozzoli (Universita di Napoli Federico II)
Illustrate a computationally demanding problem in a hot topic of applied electromagnetics: presented is the fast analysis and synthesis of electrically large, high performance reflectarray antennas intended for satellite applications. Such anten ...Read More

Illustrate a computationally demanding problem in a hot topic of applied electromagnetics: presented is the fast analysis and synthesis of electrically large, high performance reflectarray antennas intended for satellite applications. Such antennas have thousands of control parameters to be optimized, entailing a high computational burden. The use of fast numerical techniques and of parallel processing is mandatory. The key points of accurate and reliable reflectarray antenna synthesis are detailed, also with reference to the concept of aperiodic reflectarrays introduced by the authors in a world patent recently acquired by the European Space Agency (ESA) and to research projects funded by ESA. The computationally critical steps are discussed with particular reference to the fast evaluations of the radiation operator and of the functional gradient as well as the fast implementation of the optimization algorithms on GPUs. The use of Non-Uniform FFTs (NUFFTs) and of the parallel processing capabilities of GPUs by the CUDA language and the Accelereyes tools are highlighted. As Matlab has become a common platform for technical computing, interfacing of these procedures to standard Matlab scripts is also detailed. Authors: A. Capozzoli, C. Curcio, A. Liseno and G. Toso.

  Back
 
Keywords:
Algorithms & Numerical Techniques, GTC 2013 - ID S3139
Streaming:
Download:
 
Kyle Spagnoli (EM Photonics)
Attend this session to learn about cutting-edge developments taking place in the world of GPU-accelerated sparse linear algebra. This presentation will focus on recent additions and features to EM Photonics'' CULA Sparse library. In the ...Read More

Attend this session to learn about cutting-edge developments taking place in the world of GPU-accelerated sparse linear algebra. This presentation will focus on recent additions and features to EM Photonics'' CULA Sparse library. In the first half of the talk we will focus on our collection of preconditioned iterative solvers such as GMRES and the conjugate gradient method. Specifics will be given on how users can easily utilize our iterative methods to accelerate pre-existing codes where the solutions to a sparse system is a bottleneck. The second half of the talk will delve into details regarding GPU-accelerated direct methods such as sparse Cholesky and LU factorizations. Attendees will learn how high performance is achieved using a sophisticated task-scheduling framework that simultaneously dispatches work to multiple GPUs and CPUs.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Development Tools & Libraries, GTC 2013 - ID S3141
Streaming:
Download:
 
David Bortz (University of Colorado, Boulder)
We present an investigation into the emergent mathematical behavior of computational operations which are performed efficiently on massively multi-core architectures. This novel perspective for computationally solving mathematical equations is d ...Read More

We present an investigation into the emergent mathematical behavior of computational operations which are performed efficiently on massively multi-core architectures. This novel perspective for computationally solving mathematical equations is designed from the ground up for efficient implementation on massively multi-core architectures. In this session, we present one of our alogithms which randomly generates a large number of sparse domain discretizations of a Partial Differential Equation. The statistical moments of the ultra-sparse-grid solutions suggest optimal locations for gridpoints. We will apply this algorithm to Poisson and Hamilton-Jacobi steady state equations and provide preliminary analytical results.

  Back
 
Keywords:
Algorithms & Numerical Techniques, GTC 2013 - ID S3151
Streaming:
Download:
 
Jeroen Bedorf (Leiden Observatory, Leiden University), Evghenii Gaburov (SARA, Amsterdam, the Netherlands)
Find out how one can leverage massive GPU parallelism to assemble fast sparse octree construction and traverse methods by combining parallel primitives such as scan and sort algorithms. These techniques have culminated in Bonsaia hierarchical gr ...Read More

Find out how one can leverage massive GPU parallelism to assemble fast sparse octree construction and traverse methods by combining parallel primitives such as scan and sort algorithms. These techniques have culminated in Bonsaia hierarchical gravitational N-body code  which is used to study the formation and mergers of galaxies in the present day Universe. With the advent of Kepler''s dynamic parallelism, we explore the new venues that this technology opens for scalable implementations of such hierarchical algorithms. We conclude the session with cutting edge simulations complemented with spectacular visualizations that are produced in collaboration with NVIDIA''s visualization experts.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Astronomy & Astrophysics, Computational Physics, GTC 2013 - ID S3159
Streaming:
Download:
 
Li-Wen Chang (University of Illinois at Urbana-Champaign), Wen-Mei Hwu (University of Illinois at Urbana-Champaign)
Attend this session to learn new techniques to build a scalable and numerically stable tridiagonal solver for GPUs. It appears the numerical stability was missing in all existing GPU-based tridiagonal solvers. In this work, presented is a scalab ...Read More

Attend this session to learn new techniques to build a scalable and numerically stable tridiagonal solver for GPUs. It appears the numerical stability was missing in all existing GPU-based tridiagonal solvers. In this work, presented is a scalable, numerically stable, high-performance tridiagonal solver. Solver provides comparable quality of stable solutions to Intel MKL and Matlab, at speed comparable to the GPU tridiagonal solvers in existing packages like CUSPARSE. Presented and analyzed are two key optimization strategies for our solver: a high throughput data layout transformation for memory efficiency, and a dynamic tiling approach for reducing the memory access footprint caused by branch divergence. Several applications are shown to get large benefits from this solver. In this case study, Empirical Mode Decomposition, which is a critical method in time-frequency analyses, is used to demonstrate usability of our solver.

  Back
 
Keywords:
Algorithms & Numerical Techniques, GTC 2013 - ID S3191
Streaming:
Download:
 
Eric Darve (Stanford, Institute for Computational and Mathematical Engineering)
Learn about the fast multipole method (FMM) and its optimization on NVIDIA GPUs. The FMM is a well-known algorithm with a variety of applications in areas like galaxy simulation, electrostatic potential calculations, boundary element methods, in ...Read More

Learn about the fast multipole method (FMM) and its optimization on NVIDIA GPUs. The FMM is a well-known algorithm with a variety of applications in areas like galaxy simulation, electrostatic potential calculations, boundary element methods, integral equations, dislocations dynamics, etc. The FMM offers several difficulties when running on parallel heterogeneous platforms such as multicore processors with GPUs. Some parts of the calculation suffer from limited concurrency, and load-balancing can be very uneven for certain distributions of particles. We will present a new API and runtime system, called StarPU, that allows expressing a calculation as a graph of tasks, with dependencies, and contains a runtime system that can optimally schedule those tasks on a parallel machine. StarPU supports conventional multicore processors as well as NVIDIA GPUs. Authors: Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Matthias Messner, (INRIA Bordeaux - Sud-Ouest / LaBRI, Talence, France). Eric Darve, (Stanford Institute for Computational and Mathematical Engineering). Toru Takahashi, (Department of Mechanical Science and Engineering, Nagoya University, Nagoya, Japan).

  Back
 
Keywords:
Algorithms & Numerical Techniques, GTC 2013 - ID S3192
Streaming:
Download:
 
Ang Li (Electrical and Computer Engineering Department, University of Wisconsin-Madison), Dan Negrut (Mechanical Engineering, University of Wisconsin-Madison)
This session outlines implementation details of a solver for dense banded linear systems. The fundamental idea is to rely on SPIKE, which is a generic divide-and-conquer algorithm for banded systems. We will introduce our CUDA implementation of ...Read More

This session outlines implementation details of a solver for dense banded linear systems. The fundamental idea is to rely on SPIKE, which is a generic divide-and-conquer algorithm for banded systems. We will introduce our CUDA implementation of the truncated SPIKE algorithm and discuss how we refine the solution using the BiCGSTAB method. We report on a performance analysis in which we compare the developed solver with Intel''s MKL banded linear system solver for a variety of matrix and bandwidth sizes. The talk concludes with a discussion of how this work is relevant in the context of solving large sparse linear systems.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Parallel Programming Languages & Compilers, GTC 2013 - ID S3202
Streaming:
Download:
 
Jonathan Passerat-Palmbach (ISIMA / LIMOS - UMR CNRS 6158 - Blaise Pascal University)
Learn how to correctly deal with pseudorandom streams in your GPU-enabled simulations. Such considerations are often complicated to take into account without a minimum of knowledge on the subject. Attendees will discover the basic principles of ...Read More

Learn how to correctly deal with pseudorandom streams in your GPU-enabled simulations. Such considerations are often complicated to take into account without a minimum of knowledge on the subject. Attendees will discover the basic principles of pseudorandom streams distribution in parallel environments, and especially in the case of GPUs. Examples will be given using the ShoveRand framework that concretely implements these theoretical concepts. ShoveRand enables the safe use of pseudorandom facilities on NVIDIA hardware. At the end of the session, you will not only know how to use ShoveRand to feed your GPU-enabled simulations with independent random streams, but also how to integrate your own Pseudorandom Number Generators in ShoveRand. ShoveRand''s homepage and source code repository: http://forge.clermont-universite.fr/projects/ShoveRand

  Back
 
Keywords:
Algorithms & Numerical Techniques, Development Tools & Libraries, GTC 2013 - ID S3204
Streaming:
Download:
 
Umit V. Catalyurek (Ohio State University)
Who is more important in a network? Who controls the flow between the nodes or whose contribution is significant for connections? Centrality metrics such as closeness and betweenness play an important role while answering these questions. On the ...Read More

Who is more important in a network? Who controls the flow between the nodes or whose contribution is significant for connections? Centrality metrics such as closeness and betweenness play an important role while answering these questions. On the other hand, they are two of the most computationally expensive kernels in graph mining. And several techniques have been proposed for their fast computation. In this study, we investigate how to make the centrality computations much faster by compressing and modifying the graph structure and using GPU implementations of the extended centrality kernels. We compared the performance of our approach with the classical computation and parallelization approaches. Experimental results show that our techniques are highly effective and efficient for a faster centrality computation.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Databases, Data Mining, Business Intelligence, GTC 2013 - ID S3244
Streaming:
Download:
 
Jean-Marie Le Gouez (Onera - The French Aerospace Lab)
This talk deals with the adaptation of data structures and algorithmic strategies we have developed in order to achieve good performances with a high order Finite Volume code on unstructured grids on the GPU. The particularity of this solver is ...Read More

This talk deals with the adaptation of data structures and algorithmic strategies we have developed in order to achieve good performances with a high order Finite Volume code on unstructured grids on the GPU. The particularity of this solver is to use large stencils and thus to provide a good stress test for unstructured computations. The generic refinement introduce in the present project acts on a coarse mesh of high order geometrical elements. This technique is akin to the instanced tessellation in vizualization codes. The coarse grid is partitioned in small groups of cells (32, 64), each aimed at a multiprocessor. The generic refinement is applied on the data model of each coarse cell at the transfer between the CPU and the GPU units, producing fine-cell metrics, fine-face stencils for the fluxes numerical scheme. The connectivity list of the internal grid in each coarse cell is identical. All threads in a block on a given multiprocessor process in turn the cell-, face-, node- algorithms of the exact same instance at each instruction and it is possible to address data in a coalescent way since all fields are arranged in memory by blocks of data of the partition size for the multiprocessor. The non coalesced part of the algorithm is restrained to part of the data exchange across coarse cells faces, on the intitial arbitrary connectivity.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Computational Fluid Dynamics, GTC 2013 - ID S3273
Streaming:
Download:
 
Stan Tomov (University of Tennessee), Hatem Ltaief (King Abdullah University of Science and Technology, Saudi Arabia), Stojce Nakov
Learn about the newest developments in high-performance numerical linear algebra for heterogeneous GPU-based systems. A number of novel algorithms and the methodology used for their implementation on multiGPU platforms will be shown. The impleme ...Read More

Learn about the newest developments in high-performance numerical linear algebra for heterogeneous GPU-based systems. A number of novel algorithms and the methodology used for their implementation on multiGPU platforms will be shown. The implementations are open source, available through the MAGMA library âààa next generation of LAPACK for heterogeneous architectures. Included are both linear system and eigenproblem solvers for both dense and sparse computations. The developments incorporate advances made through the CUDA Center of Excellence (CCOE) at University of Tennessee, the CCOE at King Abdullah University of Science and Technology, Saudi Arabia, and at INRIA, France though the StarPU and MORSE projects.

  Back
 
Keywords:
Algorithms & Numerical Techniques, GTC 2013 - ID S3281
Streaming:
Download:
 
I-Jui (Ray) Sung (University of Illinois at Urbana-Champaign)
Learn how to perform efficient dense rectangular matrix transposition in place on the GPU. Transposition (full and tiled) has been an important building block in many GPU-accelerated algorithms for better memory access patterns. However, out-of- ...Read More

Learn how to perform efficient dense rectangular matrix transposition in place on the GPU. Transposition (full and tiled) has been an important building block in many GPU-accelerated algorithms for better memory access patterns. However, out-of-place transposition, while simple in nature, may be prohibitive due to large spatial overhead for applications with large datasets. This talk presents the techniques on in-place matrix transposition, and the audience will also learn how to use our in-place transposition library with examples given in CUDA, MATLAB, and Mathematica CUDA bindings.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Development Tools & Libraries, GTC 2013 - ID S3307
Streaming:
Download:
 
Elise de Doncker (Western Michigan University), John Kapenga (Western Michigan University), Joe McKean (Western Michigan Univesity)
Bootstrap statistics are powerful, easy to understand, can be robust, and are applicable in cases where no analytic methods are known. Their major drawback is the tine required to compute them. Two applications of CUDA accelerated bootstrap meth ...Read More

Bootstrap statistics are powerful, easy to understand, can be robust, and are applicable in cases where no analytic methods are known. Their major drawback is the tine required to compute them. Two applications of CUDA accelerated bootstrap methods will be discussed. First, a discussion of bootstrap methods will be given. Then an R plug-in, using CUDA to accelerate bootstrap methods for classical and robust error estimates and hypothesis testing will be shown. This integrates nicely and makes these bootstrap methods practical on the desktop. Second, with large simulation problems, with uncertainty in the parameters and stochastic variables, it is common to have many multiple runs of the simulation with different parameters. This cam easily take advantage of large clusters. Bootstrap methods can be used to summarize results, estimate errors, flag outliers, and test different control strategies. These will be discussed and examples given, along with the use of CUDA to do the bootstrap processing.

  Back
 
Keywords:
Algorithms & Numerical Techniques, GTC 2013 - ID S3338
Streaming:
Download:
 
Lu Wang (Pennsylvania State University)
This talk introduces a new parallel auxiliary grid algebraic multigrid (AMG) method to leverage the power of GPUs. In the construction of the hierarchical coarse grid, a simple and fixed coarsening procedure based on a region quadtree generated ...Read More

This talk introduces a new parallel auxiliary grid algebraic multigrid (AMG) method to leverage the power of GPUs. In the construction of the hierarchical coarse grid, a simple and fixed coarsening procedure based on a region quadtree generated from an auxiliary grid is used. This allows the explicit control of the sparsity patterns and operator complexities of the AMG solver. This feature provides (nearly) optimal load balancing and predictable communication patterns, which makes this new algorithm suitable for parallel computing, especially on GPU. A parallel smoother based on the special coloring of the quadtree to accelerate the convergence rate and improve the parallel performance of this solver was designed. Based on the CUDA toolkit, a new parallel auxiliary grid AMG method on GPU was implemented and the numerical results of this implementation demonstrate the efficiency of this new method. The results achieve an average speedup of over 4x on quasi-uniform grids and 2x on shape regular grids when compared to the AMG implementation in CUSP.

  Back
 
Keywords:
Algorithms & Numerical Techniques, GTC 2013 - ID S3375
Streaming:
Download:
 
Jedrzej Jablonski (University of Warsaw)
Get an insight into the techniques used in the new, fast rejection-based Poisson generator implemented in CURAND 5.0. Rejection algorithms are commonly used in many non-uniform generators and can be optimized for GPUs on many levels. This talk w ...Read More

Get an insight into the techniques used in the new, fast rejection-based Poisson generator implemented in CURAND 5.0. Rejection algorithms are commonly used in many non-uniform generators and can be optimized for GPUs on many levels. This talk will cover the following three problems: (1) choosing the approximation function, (2) smart precomputing for generating muliple samples, (3) eliminating branching in rejection algorithms.

  Back
 
Keywords:
Algorithms & Numerical Techniques, GTC 2013 - ID S3384
Streaming:
Download:
 
Ian Lane (Carnegie Mellon University), William Chan (Carnegie Mellon University)
In recent years Deep-Networks have been shown to be effective for large-scale machine learning tasks including computer vision and automatic speech recognition. Deep-Networks for these tasks can be extremely large, containing millions of model p ...Read More

In recent years Deep-Networks have been shown to be effective for large-scale machine learning tasks including computer vision and automatic speech recognition. Deep-Networks for these tasks can be extremely large, containing millions of model parameters and may be trained on billions of training examples. Training of these models is extremely challenging to scale and it can take many weeks to train a single network on a large dataset. In this talk we will present novel methods for large-scale training of Deep-Networks on Distributed GPU Platforms. Leveraging the computational power of GPGPUs and scaling this computation across multiple compute nodes enables us to effectively train large networks in reasonable time.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Computer Vision, Machine Learning & AI, GTC 2013 - ID S3404
Streaming:
Download:
 
Sean Baxter (NVIDIA), Duane Merrill (NVIDIA)
We survey a diverse group of functions that rely on common partitioning strategies. A search phase partitions sorted input sequences into uniformly-sized interval pairs. We amortize the cost of parallel search with communication-free serial work ...Read More

We survey a diverse group of functions that rely on common partitioning strategies. A search phase partitions sorted input sequences into uniformly-sized interval pairs. We amortize the cost of parallel search with communication-free serial work, and find a tuned balance between serial grain size and parallel occupancy. This strategy easily exposes the parallelism in vector operations without sacrificing the linear work efficiency of their serial treatments. We demonstrate a variety of implementations that follow this pattern, including merges, searches, multisets, and relational joins.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Databases, Data Mining, Business Intelligence, GTC 2013 - ID S3414
Streaming:
Download:
 
Hatem Ltaief (Supercomputing Laboratory, KAUST), Ahmad Abdelfattah (Center of Extreme Computing, KAUST), Rio Yokota (Center of Extreme Computing, KAUST)
Reservoir simulation involve sparse iterative solvers for linear systems that arise from implicit discretizations of coupled PDEs from high-fidelity reservoir simulators. One of the major bottlenecks in these solvers is the sparse matrix-vector ...Read More

Reservoir simulation involve sparse iterative solvers for linear systems that arise from implicit discretizations of coupled PDEs from high-fidelity reservoir simulators. One of the major bottlenecks in these solvers is the sparse matrix-vector product. Sparse matrices are usually compressed in some format (e.g., CSR, ELL) before being processed. In this talk, we focus on the low-level design of a sparse matrix-vector (SpMV) kernel on GPUs. Most of the relevant contributions focus on introducing new formats that suit the GPU architecture such as the diagonal format for diagonal matrices and the blocked-ELL format for sparse matrices with small dense blocks. However, we target both generic and domain-specific implementations. Generic implementations basically target the CSR and ELL formats, in order to be part of the KAUST-BLAS library. More chances for further optimizations appear when the matrix has specific structure. In the talk, we will present the major design challenges and outlines, and preliminary results. The primary focus will be on the CSR format, where some preliminary results will be shown. The other bottleneck of reservoir simulations is the preconditioning in the sparse matrix solver. We investigate the possibility of a Fast Multipole Method based technique on GPUs as a compute-bound preconditioner.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Energy Exploration, GTC 2013 - ID S3449
Streaming:
Download:
 
Tim Droz (SoftKinetic, Inc.)
3D depth cameras are about to make the leap from the living room to mobile devices.  SoftKinetic is leading-edge provider of end-to-end solutions for 3D gesture processing.  This paper will cover parallel algorithms and data structures ...Read More

3D depth cameras are about to make the leap from the living room to mobile devices.  SoftKinetic is leading-edge provider of end-to-end solutions for 3D gesture processing.  This paper will cover parallel algorithms and data structures that can dramatically accelerate depth calculation, filtering, RGB-to-depth mapping and gesture tracking algorithms on today's mobile processors. SoftKinetic will also look forward to advanced techniques which will become possible as mobile SoCs incorporate compute acceleration using CUDA and OpenCL.  SoftKinetic's DepthSense cameras and iisu gesture middleware have been used in a variety of devices including PC desktops, Smart TVs, tablets and in-car infotainment platforms.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Mobile Summit, Video & Image Processing, GTC 2013 - ID S3498
Download:
 
Jonathan Cohen (NVIDIA)
NVIDIA has been developing a library of high-performance parallel sparse iterative linear solvers, with an emphasis on multilevel and multigrid methods. In this presentation, I will provide an overview of the library''s design and outlin ...Read More

NVIDIA has been developing a library of high-performance parallel sparse iterative linear solvers, with an emphasis on multilevel and multigrid methods. In this presentation, I will provide an overview of the library''s design and outline many of the challenges we have faced in balancing numerical behavior against parallel scalability. Our library has been integrated into ANSYS Fluent 14.5, and will be released as a fully supported feature in the upcoming Fluent 15. I will describe the collaboration between ANSYS and NVIDIA, and present benchmarking results across a variety of test problems from CFD and other fields. Finally, I will talk about our future plans and discuss some of the open research problems in the area of algebraic multigrid on massively parallel processors.

  Back
 
Keywords:
Algorithms & Numerical Techniques, Computational Fluid Dynamics, Energy Exploration, GTC 2013 - ID S3579
Streaming:
Download:
Architectural Mapping & Event Visualization
Presentation
Media
Rodrigo Lopez (Neoscape), Matt Richardson (Neoscape)
When creating computer generated photorealistic imagery, there is a great deal of care taken to materiality and lighting. Having these aspects look as real as possible is essential to the overall quality and final look and feel of the image. In ...Read More

When creating computer generated photorealistic imagery, there is a great deal of care taken to materiality and lighting. Having these aspects look as real as possible is essential to the overall quality and final look and feel of the image. In the past, without the use of real time solutions, several iterations of test renders were needed to dial in the desired settings which was both time consuming for the artist as well as tying up valuable resources while rendering. With the use of an NVIDIA Maximus system along with real time render solutions, such as VRay RT and iray, this process has become greatly accelerated, giving the artist improved flexibility and more responsive interaction when fine-tuning these settings.

  Back
 
Keywords:
Architectural Mapping & Event Visualization, Manufacturing General, GTC 2013 - ID S3551
Streaming:
Download:
Astronomy & Astrophysics
Presentation
Media
Tareq AbuZayyad (University of Utah)
The Telescope Array Cosmic Rays Detector located in the Western Utah Desert is used for the observation of ultra-high energy cosmic rays. The simulation of a fluorescence detector response to cosmic rays initiated air showers presents many oppor ...Read More

The Telescope Array Cosmic Rays Detector located in the Western Utah Desert is used for the observation of ultra-high energy cosmic rays. The simulation of a fluorescence detector response to cosmic rays initiated air showers presents many opportunities for parallelization. In this presentation we report on the Monte Carlo program used for the simulation of the Telescope Array fluorescence detector located at the Middle Drum site. The program makes extensive use of GPU acceleration to achieve a 50x speedup compared to running on a single CPU core. All of the physics simulation from shower development, light production and propagation with atmospheric attenuation, as well as, the realistic detector optics and electronics simulations are done on the GPU. A detailed description of the code implementation is given, and results on the accuracy and performance of the simulation are presented as well.

  Back
 
Keywords:
Astronomy & Astrophysics, Algorithms & Numerical Techniques, GTC 2013 - ID S3189
Streaming:
Download:
 
Harshavardhan Reddy Suda (GMRT Observatory, National Centre for Radio Astrophysics, TIFR, Pune, India), Pradeep Kumar Gupta (NVIDIA)
The goal of this session is to demonstrate the power of GPUs in real-time signal processing applications in radio astronomy telescopes, and outline the future growth path for this exciting new application of GPUs. Modern radio astronomy telescop ...Read More

The goal of this session is to demonstrate the power of GPUs in real-time signal processing applications in radio astronomy telescopes, and outline the future growth path for this exciting new application of GPUs. Modern radio astronomy telescopes are multiple antenna instruments where the wideband data from each antenna needs to be processed in real-time to implement digital receiver systems such as correlators and beamformers. We will demonstrate how such compute and data I/O intensive algorithms can be implemented on a distributed GPGPU system, with a fully real-time realisation. Hybrid computing techniques such as CUDA on GPU, OpenMP & MPI to synchronise the distributed host machines and handle the large i/o between them are key elements of such designs. Optimised implementation of signal processing algorithms such as FFT and MAC on GPUs, as well as the use of streams to optimise computing and I/O on the GPU, will be addressed in detail. All these concpets will be illustrated with the example of the prototype GPGPU correlator and beamformer that has been developed by us for the GMRT which is a 30-antenna radio telescope with 400 MHz BW dual polarised signals from each antenna, coming in at a sustained input data rate of 24 GBytes/sec.

  Back
 
Keywords:
Astronomy & Astrophysics, Signal Processing, GTC 2013 - ID S3225
Streaming:
Download:
 
John Romein (ASTRON Netherlands Institute for Radio Astronomy)
This talk will present research on accelerator-based computing for radio telescopes. Showing GPU implementations of a dozen of (signal-processing) algorithms used by radio telescopes, e.g., filtering, correlating, beam forming, dedispersion, and ...Read More

This talk will present research on accelerator-based computing for radio telescopes. Showing GPU implementations of a dozen of (signal-processing) algorithms used by radio telescopes, e.g., filtering, correlating, beam forming, dedispersion, and peak detection. Glued together, these computational kernels form several processing pipelines. Each pipeline implements an observation mode, as used by the LOFAR radio telescope. Implemented pipelines create sky images, to search for pulsars, to observe known pulsars, and to detect ultra-high-energy particles - first on a Blue Gene/P, and ported these to GPUs. This talk will briefly explain these algorithms and processing pipelines, show performance results, multi-GPU scaling results, and impact on energy efficiency. The research is relevant to current radio telescopes like LOFAR, and the future SKA telescope, that needs exascale computing power.

  Back
 
Keywords:
Astronomy & Astrophysics, Signal Processing, GTC 2013 - ID S3352
Streaming:
Download:
 
Peng Wang (NVIDIA)
Learn the porting of ENZO solvers to GPU. ENZO is a block-structured adaptive mesh refinement (AMR) astrophysical fluid dynamics code used for simulating cosmological structure formation. It is one of the most commonly used community code in ast ...Read More

Learn the porting of ENZO solvers to GPU. ENZO is a block-structured adaptive mesh refinement (AMR) astrophysical fluid dynamics code used for simulating cosmological structure formation. It is one of the most commonly used community code in astrophysics. We have ported the PPM Hydrodynamics and Magnetohydrodynamics solvers to GPU and integrated the GPU solvers fully into the AMR framework. This talk will describe the porting strategy and performance results.

  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2013 - ID S3401
Streaming:
Download:
 
Ben Barsdell (Harvard University)
Radio astronomy is a real-time signal processing application that requires extreme supercomputing. While today''s radio telescopes require 10-100 Tflops of computational power, by the end of the decade this will increase into the Exaflop ...Read More

Radio astronomy is a real-time signal processing application that requires extreme supercomputing. While today''s radio telescopes require 10-100 Tflops of computational power, by the end of the decade this will increase into the Exaflops regime, driven by the Hydrogen Epoch of Reionization Array (HERA) and the Square Kilometer Array (SKA). The most compute intensive part of this problem is the so-called cross-correlation algorithm, which can be recast as a linear-algebra problem similar in spirit to DGEMM. In this session we describe the cross-correlation engine that powers the pathfinder LEDA radio telescope and has been (re)optimized for the Kepler GK110 architecture to achieve over 2.5 Tflops in sustained performance. This level of efficiency is critical to meeting strict power and space constraints imposed by the instrument''s remote location.

  Back
 
Keywords:
Astronomy & Astrophysics, Algorithms & Numerical Techniques, GTC 2013 - ID S3497
Streaming:
Download:
 
Claudio Gheller (ETH CSCS)
Numerical simulations represent one of the most effective tools to study and to solve astrophysical problems. Thanks to the enormous technological progress in the recent years, the available supercomputers allow now to study the details of compl ...Read More

Numerical simulations represent one of the most effective tools to study and to solve astrophysical problems. Thanks to the enormous technological progress in the recent years, the available supercomputers allow now to study the details of complex processes, like galaxy formation or the evolution of the large scale structure of the universe. Sophisticated numerical codes can exploit the most advanced HPC architectures to simulate such phenomena and process and visualize their results. Enzo, Ramses and Splotch are prime examples of such codes. Work is ongoing to enable such codes to GPUs using the CUDA and OpenACC programming models. The accomplished refactoring work together with recent tests and results are presented.

  Back
 
Keywords:
Astronomy & Astrophysics, Supercomputing, GTC 2013 - ID S3555
Streaming:
Download:
Automotive
Presentation
Media
Kerry Johnson (QNX Software Systems)
The growing convergence of mobile handsets and automotive platforms is creating a new market opportunity for app developers. That said, many differences exist between the smartphone and the car, and understanding them is key to unlocking the pot ...Read More

The growing convergence of mobile handsets and automotive platforms is creating a new market opportunity for app developers. That said, many differences exist between the smartphone and the car, and understanding them is key to unlocking the potential of this new market. To give the app developer a jump-start, this session explores how a car infotainment system is structured, UX considerations for automotive applications, design principles for taking best advantage of SoCs like Tegra 3, and key differences between mobile and automotive platforms.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI), GTC 2013 - ID S3223
Streaming:
Download:
 
Victor Ng-Thow-Hing (Honda Research Institute USA)
The challenge of introducing augmented reality to head-up displays for automobiles requires balancing between the visual, immersive richness this medium provides with the need for the driver to stay focused on the primary task of driving. This s ...Read More

The challenge of introducing augmented reality to head-up displays for automobiles requires balancing between the visual, immersive richness this medium provides with the need for the driver to stay focused on the primary task of driving. This session explores how to solve these problems by combining design methodologies with technological research. Before field testing ideas in actual cars, high fidelity prototypes with driving simulators are utilized with an actual windshield head-up display to visualize the augmented graphics. UI Composer is leveraged with proprietary software to engage designers in the prototyping process.

  Back
 
Keywords:
Automotive, Advanced Driver Assistance Systems (ADAS), Instrument Clusters & Heads-Up Display (HUD), Manufacturing Technical, GTC 2013 - ID S3230
Streaming:
Download:
 
Don Burns (NVIDIA)
This session will provide techniques for rendering 3D maps efficiently and at high frame rates, while still preserving quality. Topics to be discussed include tile cache management, tile fetch, tile rendering techniques, layer management, and 3D ...Read More

This session will provide techniques for rendering 3D maps efficiently and at high frame rates, while still preserving quality. Topics to be discussed include tile cache management, tile fetch, tile rendering techniques, layer management, and 3D object rendering.

  Back
 
Keywords:
Automotive, Navigation Systems, GTC 2013 - ID S3386
Streaming:
Download:
 
Vladimir Glavtchev (NVIDIA)
This session will present a motion estimation approach to pedestrian and cyclist detection. Through analyzing motion across several frames, this technique accurately segments foreground objects from the background, including their positions and ...Read More

This session will present a motion estimation approach to pedestrian and cyclist detection. Through analyzing motion across several frames, this technique accurately segments foreground objects from the background, including their positions and velocities. Foreground objects are classified as pedestrians, bicyclists or motorcyclists, or other objects on the road surface. The entire process is optimized to minimize the computation resources needed for detection and classification. The optimizations make it possible to perform the entire process on a mobile grade GPU system with a modest host processor.

  Back
 
Keywords:
Automotive, Advanced Driver Assistance Systems (ADAS), Computer Vision, GTC 2013 - ID S3396
Streaming:
Download:
 
Ian Lane (Carnegie Mellon University)
AIDAS, an Intelligent Driver Assistive System being developed at Carnegie Mellon University, enables the investigation of Immersive Interaction within vehicles. The AIDAS platform enables rich, speech-centric interaction with the driver. Interac ...Read More

AIDAS, an Intelligent Driver Assistive System being developed at Carnegie Mellon University, enables the investigation of Immersive Interaction within vehicles. The AIDAS platform enables rich, speech-centric interaction with the driver. Interactions are both context-aware, based on the location of the car and driver''s gaze direction, and are natural, akin to interacting with a human assistant. This session will introduce the core speech and vision components used within AIDAS and describe the approaches used to accelerate these technologies to realize a real-time interactive system.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI), GTC 2013 - ID S3403
Streaming:
Download:
 
Ian Riches (Strategy Analytics)
This session will examine the driving forces behind the adoption of Advanced Driver Assistance Systems (ADAS), one of the fastest growing application areas by car makers. Key battle grounds between new and existing suppliers will be examined, an ...Read More

This session will examine the driving forces behind the adoption of Advanced Driver Assistance Systems (ADAS), one of the fastest growing application areas by car makers. Key battle grounds between new and existing suppliers will be examined, and forecasts presented for key systems, semiconductors and sensors. Despite the high forecast growth, challenges remain to widespread adoption across the globe. These barriers will be explained, together with recommendations for what needs to be done to overcome them.

  Back
 
Keywords:
Automotive, Advanced Driver Assistance Systems (ADAS), Instrument Clusters & Heads-Up Display (HUD), GTC 2013 - ID S3413
Streaming:
Download:
 
Justin Ebert (NVIDIA)
UI Composer Studio is the ground-breaking HMI design tool used for instrument clusters and infotainment systems. Developed by NVIDIA, it is used by automakers and Tier 1 automotive suppliers to rapidly develop proof of concepts for evaluation, m ...Read More

UI Composer Studio is the ground-breaking HMI design tool used for instrument clusters and infotainment systems. Developed by NVIDIA, it is used by automakers and Tier 1 automotive suppliers to rapidly develop proof of concepts for evaluation, market research, usability testing and ultimately final production.This session covers the basics of constructing an instrument cluster and IVI using Studio''s advanced authoring environment.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI), Instrument Clusters & Heads-Up Display (HUD), GTC 2013 - ID S3419
Streaming:
Download:
 
Mario Tippelhofer (Audi)
The goal of the Audi Urban Intelligent Assist (AUIA) research initiative is to showcase different technologies and approaches to make the challenges of navigating the chaotic roadways of the world''s megacities less stressful, safer and ...Read More

The goal of the Audi Urban Intelligent Assist (AUIA) research initiative is to showcase different technologies and approaches to make the challenges of navigating the chaotic roadways of the world''s megacities less stressful, safer and more efficient a generation from now. This is mainly achieved through advancements in predictive technology, by harnessing the power of Big Data through algorithms, real time data, Human Machine Interfaces (HMI), advanced sensors and other innovative approaches. The AUIA project is the latest in a series of university collaborations that Audi has formed to explore the frontiers of automotive technologies and electronics.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI), GTC 2013 - ID S3481
Streaming:
Download:
 
Ron Szabo (Delphi Coroporation)
This session will cover the current and future requirements for GPUs in the automotive space for infotainment systems. Four areas contribute to the exponential growth of processing power required onboard: (1) traditional feature growth; (2) the ...Read More

This session will cover the current and future requirements for GPUs in the automotive space for infotainment systems. Four areas contribute to the exponential growth of processing power required onboard: (1) traditional feature growth; (2) the impact of mobile devices and brought in content; (3) the compounding effect of off-board services and cloud connectivity and; (4) development headroom to eventually eliminate optimization. Critical tradeoffs that Tier 1s and OEMs need to make will be discussed.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI), GTC 2013 - ID S3542
Streaming:
Download:
Bioinformatics & Genomics
Presentation
Media
Bertil Schmidt (Johannes Gutenberg University Mainz)
Next-Generation Sequencing (NGS) refers to new technologies for high-throughput DNA sequencing which produce up to billions of DNA or RNA reads in short time and at low cost. To exploit NGS, efficient parallel and scalable algorithms and tools a ...Read More

Next-Generation Sequencing (NGS) refers to new technologies for high-throughput DNA sequencing which produce up to billions of DNA or RNA reads in short time and at low cost. To exploit NGS, efficient parallel and scalable algorithms and tools are needed to process the massive amount of generated reads within a reasonable amount of time. This talk will present several CUDA-enabled algorithms and data structures to accelerate (i) the accurate processing of short/long read alignment to human genomes (i.e. CUSHAW and CUSHAW2) and (ii) the analysis of metagenomic data from microbial environmental sequencing studies (CRiSPy-CUDA and CRiSPy-Embed).

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3004
Streaming:
Download:
 
Michal Kierzynka (Poznan University of Technology, Poznan Supercomputing and Networking Center)
The goal of this session is to present a software tool performing pairwise sequence alignment of nucleotide sequences as well as GPU optimizations used to achieve the top performance. The software uses the dynamic programming method to efficient ...Read More

The goal of this session is to present a software tool performing pairwise sequence alignment of nucleotide sequences as well as GPU optimizations used to achieve the top performance. The software uses the dynamic programming method to efficiently compute the exact alignment in a form that may be conveniently used in the DNA de-novo assembly problem. Its uniqueness is also due to the fact that it has been optimized for nucleotide reads coming from modern sequencers (Illumina/Solexa, Roche/454, AB/SOLiD). As a result, it is currently the fastest implementation of the Needlemen-Wunch algorithm, reaching up to 89GCUPS on a single GPU, and scaling up well on multiple GPUs systems. The following real-world use case will be presented: the application of the software in finding similar sequences in huge datasets coming from the next-generation Illumina sequencer.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3025
Streaming:
Download:
 
Richard Wilton (Johns Hopkins University -- Department of Physics and Astronomy)
The Department of Physics and Astronomy at Johns Hopkins University is currently constructing a new computer cluster to facilitate high-throughput data-intensive computation on terabyte-scale data, including the analysis of genomic sequence data ...Read More

The Department of Physics and Astronomy at Johns Hopkins University is currently constructing a new computer cluster to facilitate high-throughput data-intensive computation on terabyte-scale data, including the analysis of genomic sequence data. Compute nodes in the cluster contain multiple CPU cores, 100GB or more of system RAM, and one or more GPUs; a prototype node is implemented with 12 CPU cores (24 hyperthreads), 144GB of RAM, and four NVIDIA C2070s. In this session we will describe the design of a genomic sequence-alignment application that targets the cluster compute-node hardware. We will discuss the algorithms we use and how they are implemented as CUDA kernels, point out the key optimizations in the implementation, and look at the performance of the software.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3092
Streaming:
Download:
 
Qiao Wang (National ICT Australia Victoria Lab), Adam Kowalczyk (Victorian Research Laboratory of National ICT Australia)
This talk will show a developed platform enabling an exhaustive analysis of all pairwise (2nd order) interactions of Single Nucleotide Polymorphisms (SNPs) in Genome Wide Association Studies (GWAS) data. Given typical datasets of 300K SNPs and 3 ...Read More

This talk will show a developed platform enabling an exhaustive analysis of all pairwise (2nd order) interactions of Single Nucleotide Polymorphisms (SNPs) in Genome Wide Association Studies (GWAS) data. Given typical datasets of 300K SNPs and 3K samples, our GPU-accelerated solution is capable of completing the search below 3 minutes on single NVIDIA GTX470. The method involves construction of contingency tables for all SNP-pairs followed by a battery of conventional statistical tests such as Fisher-Exact and Variance Explained. All previous implementations described in the literature required hours, days or even months to complete the same analysis. In addition, presented will be an interface that allows users to define their own statistical tests at runtime and describe our latest developments towards practical 3rd order implementation.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3169
Streaming:
Download:
 
Mohit Gupta (Life Technologies), Jakob Siegel (Life Technologies)
Learn how GPUs are enabling whole genome sequencing by accelerating primary data analysis pipeline of benchtop Ion Proton sequencer. Leveraging the compute power of GPUs to process the high-throughput data generated by this sequencer for a fast, ...Read More

Learn how GPUs are enabling whole genome sequencing by accelerating primary data analysis pipeline of benchtop Ion Proton sequencer. Leveraging the compute power of GPUs to process the high-throughput data generated by this sequencer for a fast, scalable and cost-effective desktop compute solution to democratize DNA sequencing and accelerate the path towards personalized medicine. In this talk, the implementation of data fitting algorithms on the GPU and a streaming execution model to overlap data transfer and kernel execution for this high throughput system will be dicussed. Explained will be how changing the algorithms to suit the GPU compute model while still maintaining quality of the results. 

  Back
 
Keywords:
Bioinformatics & Genomics, Algorithms & Numerical Techniques, GTC 2013 - ID S3229
Streaming:
Download:
 
Marco Maggioni (University of Illinois at Chicago)
In this session we present an innovative system biology application of GPU computing, as an alternative to molecular dynamics simulation for studying biochemical mechanisms inside the cells. For the first time we are able to apply the Chemical M ...Read More

In this session we present an innovative system biology application of GPU computing, as an alternative to molecular dynamics simulation for studying biochemical mechanisms inside the cells. For the first time we are able to apply the Chemical Master Equation (CME) stochastic framework at large scale, determining both probabilistic steady-state and transient dynamic of biochemical reaction networks. Our GPU implementation leverages the structure of the problem to optimize the sparse linear algebra routines needed by the stochastic model. As a result, we achieve an average 15.57x speedup over the optimized Intel MKL library running on a 64-core architecture.

  Back
 
Keywords:
Bioinformatics & Genomics, Computational Chemistry, GTC 2013 - ID S3245
Streaming:
Download:
 
BingQiang Wang (Beijing Genomics Institute)
GPU can help scientists find new clues for curing cancer as well as other diseases from massive genomics data. How to compute and mine against huge volume of genomics and derivated data becomes a major challenge. Compression technique helps redu ...Read More

GPU can help scientists find new clues for curing cancer as well as other diseases from massive genomics data. How to compute and mine against huge volume of genomics and derivated data becomes a major challenge. Compression technique helps reduce volume and more efficient access. GPU accelerated version for typical compression algorithms are developed with speed up around 2-10x. Integrating Hadoop framework with GPU is very promising for large-scale analysis over big data like genome-wide associate study (GWAS), which made the entire analysis more balanced in terms of computing to data access ratio.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3257
Streaming:
Download:
 
Erich Elsen (Royal Caliber)
Learn how to use the SIMD video instructions introduced with the Kepler architecture to accelerate Smith-Waterman and Needleman-Wunsch DNA sequence alignment. Speedups of up to 3x over scalar code are possible - new code achieves over 80 GCUPs o ...Read More

Learn how to use the SIMD video instructions introduced with the Kepler architecture to accelerate Smith-Waterman and Needleman-Wunsch DNA sequence alignment. Speedups of up to 3x over scalar code are possible - new code achieves over 80 GCUPs on a GeForce GTX 680 and close to 150 GCUPs on a Tesla K10 GPU accelerator. Specific implementation is for the case of performing many independent alignment problems of length < 1024 simultaneously, however the techniques that will be discussed are generally applicable to any sequence alignment problem. SIMD video instructions allow one to split a 32-bit register into two 16-bit or four 8-bit parts and operate on them independently.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3279
Streaming:
Download:
 
Robert Zigon (Beckman Coulter)
Analytical Ultracentrifugation is a technique used to compute attributes of a protein like gross shape, sample heterogeneity or size. By applying a centrifugal force to the sample and simultaneously measuring the distribution, we can use first p ...Read More

Analytical Ultracentrifugation is a technique used to compute attributes of a protein like gross shape, sample heterogeneity or size. By applying a centrifugal force to the sample and simultaneously measuring the distribution, we can use first principles to derive the relative molecule sizes. Learn how the solution to the resulting regularized least squares problem can be computed in real time with the Tesla K20.

  Back
 
Keywords:
Bioinformatics & Genomics, Computational Chemistry, GTC 2013 - ID S3330
Streaming:
Download:
 
Jonathan Cohen (NVIDIA)
Because of their inherently parallel and high-throughput nature, NVIDIA GPUs are a natural fit for the types of data-intensive computing required in bioinformatics applications. For many genomics applications, the primary challenge is to map hig ...Read More

Because of their inherently parallel and high-throughput nature, NVIDIA GPUs are a natural fit for the types of data-intensive computing required in bioinformatics applications. For many genomics applications, the primary challenge is to map highly divergent and control flow-heavy code to a SIMD architecture. By transforming complex serial flow of control into a sequence of communicating sequential processors running in parallel, we are able to achieve high throughput on very branchy code, while maintaining memory coherence and avoiding execution divergence. I will present initial results from NVIDIA''s internal "nvbio" project to develop efficient computational building blocks for analysis of Next-Generation Sequencing data, with a focus on implementations of BWA and Bowtie2-type aligners.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3580
Streaming:
Download:
Climate, Weather, Ocean Modeling
Presentation
Media
Chris Lupo (California Polytechnic State University)
Learn how GPGPUs have been targeted to the Regional Ocean Modeling System (ROMS) software package. We describe a multi-model parallelization approach that uses CUDA Fortran and the OpenACC directives based models supported PGI compilers. Initial ...Read More

Learn how GPGPUs have been targeted to the Regional Ocean Modeling System (ROMS) software package. We describe a multi-model parallelization approach that uses CUDA Fortran and the OpenACC directives based models supported PGI compilers. Initial research using only CUDA Fortran on one Tesla card offers comparable performance to a 16-node CPU cluster, and a 2.5x speedup compared to an OpenMP implementation on an eight-core CPU system. We are currently targeting multiple GPU devices, and the use of OpenACC to parallelize more of the ROMS software to obtain even greater performance enhancements to allow larger, higher resolution ocean models to be simulated.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2013 - ID S3082
Streaming:
Download:
 
Maxim Milakov (NVIDIA)
Learn how OpenACC can be used to accelerate a challenging application with a large amount of code and a flat profile. NEMO is an Ocean modeling code consisting of tens of thousands of lines of code, hundreds of subroutines. NEMO also has a flat ...Read More

Learn how OpenACC can be used to accelerate a challenging application with a large amount of code and a flat profile. NEMO is an Ocean modeling code consisting of tens of thousands of lines of code, hundreds of subroutines. NEMO also has a flat execution profile, making it a challenge to expose opportunities for parallel acceleration. Using OpenACC directives, we show how the time stepping loop can be migrated to the GPU to achieve substantial performance improvements on multiple problems at small and large scales.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Parallel Programming Languages & Compilers, GTC 2013 - ID S3209
Streaming:
Download:
 
Kevin Tubbs (Dell, Inc.)
A lattice Boltzmann method for solving the shallow water equations and the advection-dispersion equation is developed and implemented on graphics processing unit (GPU)-based architectures. The proposed LBM is implemented to an NVIDIA Computing P ...Read More

A lattice Boltzmann method for solving the shallow water equations and the advection-dispersion equation is developed and implemented on graphics processing unit (GPU)-based architectures. The proposed LBM is implemented to an NVIDIA Computing Processors. GPU computing is performed using the Jacket GPU engine for MATLAB and ArrayFire. Mass transport with velocity-dependent dispersion in shallow water flow is simulated by combining the MRT-LBM model and the TRT-LBM model. This talk will demonstrate the GPU parallel performance for modeling mass transport phenomena in shallow water flows.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Algorithms & Numerical Techniques, GTC 2013 - ID S3324
Streaming:
Download:
 
Michel Muller (RIKEN)
One of the biggest challenges when applying GPGPU frameworks to complex codebases is the necessity for redesigning loops and data access patterns. Experiences with the physical core of the ASUCA weather prediction model have shown that using pur ...Read More

One of the biggest challenges when applying GPGPU frameworks to complex codebases is the necessity for redesigning loops and data access patterns. Experiences with the physical core of the ASUCA weather prediction model have shown that using pure CUDA Fortran or OpenACC leads to a lengthy manual redesign and large execution time overheads when executing the new code back on the CPU. The Hybrid Fortran 90 meta programming framework has been designed to (a) automate this process and (b) be able to run the user code in CPU optimized loop structure as well, thus enabling optimal performance both on GPU and CPU. Results when using it for the ASUCA physical core show High GPU performance, CPU performance on par with the original x86-optimized code, and reduced portation overhead. In this session learn about what''s behind Hybrid Fortran 90 and how to use it.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Parallel Programming Languages & Compilers, GTC 2013 - ID S3326
Streaming:
Download:
 
Jaroslaw Piwonski (Institute for Computer Science and Kiel Marine Science, Centre for Interdisciplinary Marine Science, Christian-Albrechts Universitaet zu Kiel)
This session shows the necessary steps of porting an implementation of the spin-up for marine ecosystem models based on transport matrices to graphics processing units (GPUs). The original implementation was designed for distributed-memory archi ...Read More

This session shows the necessary steps of porting an implementation of the spin-up for marine ecosystem models based on transport matrices to graphics processing units (GPUs). The original implementation was designed for distributed-memory architectures and uses the Portable, Extensible Toolkit for Scientific Computation (PETSc) library that is based on the Message Passing Interface (MPI) standard. The used programming languages are C and Fortran. A special emphasis lies on using biogeochemical models written in Fortran without any modifications to the original code. Using the GPU Compute Unified Device Architecture (CUDA) standard, a customized version of PETSc and a commercial CUDA Fortran compiler.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing, GTC 2013 - ID S3385
Streaming:
Download:
 
Oliver Fuhrer (MeteoSwiss)
A full GPU implementation of the COSMO numerical weather prediction and regional climate model will will presented. Design criteria such as high performance, retaining maintainability, enforcing a single source code which still compiles and runs ...Read More

A full GPU implementation of the COSMO numerical weather prediction and regional climate model will will presented. Design criteria such as high performance, retaining maintainability, enforcing a single source code which still compiles and runs on x86-based, led us to opt for different approaches in different parts of the model code. Performance critical parts are implemented employing a stencil library built on top of a domain specific embedded language (DSEL) with a CUDA back-end. Other parts were ported by restructuring of the legacy Fortran code and inserting OpenACC compiler directives. The session will also highlight the integration of these different technologies.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2013 - ID S3417
Streaming:
Download:
 
Mark Govett (NOAA Earth System Research Laboratory)
Two U.S. global-scale weather models, developed at NOAA, are running on GPUs. The FIM runs at 15 KM resolution and is expected to be run by the U.S. National Weather Service in the next year. The NIM is a next-generation forecast model designed ...Read More

Two U.S. global-scale weather models, developed at NOAA, are running on GPUs. The FIM runs at 15 KM resolution and is expected to be run by the U.S. National Weather Service in the next year. The NIM is a next-generation forecast model designed to run at 4KM resolution. This presentation will give an update on our efforts to parallelize and run these models on GPUs.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing, GTC 2013 - ID S3429
Streaming:
Download:
Cloud Visualization
Presentation
Media
Arend Dittmer (Penguin Computing)
Penguin Computing's public HPC cloud Penguin Computing on Demand (POD) provides compute power for HPC applications through NVidia Tesla GPUs. To make it easy to leverage NVidia Tesla GPU resources for rendering tasks Penguin is hosting migen ...Read More

Penguin Computing's public HPC cloud Penguin Computing on Demand (POD) provides compute power for HPC applications through NVidia Tesla GPUs. To make it easy to leverage NVidia Tesla GPU resources for rendering tasks Penguin is hosting migenius' Reality Server. The RealityServer platform is a 3D web services software that leverages NVidia Tesla GPUs to deliver interactive, photorealistic applications over the web, enabling product designers, architects and consumers to easily visualise 3D scenes with remarkable realism. The session will discuss the workflow for using Reality Server on POD. Using Fluid Inc's Configure offering as an example the session will illustrate how retailers can leverage POD, NVidia Teslas and the reality server platform for scalable, fast-to-market, and easy to manage product customizations.

  Back
 
Keywords:
Cloud Visualization, Remote Graphics & Cloud-Based Graphics, GTC 2013 - ID S3552
Streaming:
Download:
 
Gil Rosen (T-Labs, Deutsche Telekom)
The "Power Plays in the Digital Era" is a thought leadership presentation that aims to provide strategic clarity in an era of chaos. In this session, the real strategy of leading market players will be exposed and analyzed providing in ...Read More

The "Power Plays in the Digital Era" is a thought leadership presentation that aims to provide strategic clarity in an era of chaos. In this session, the real strategy of leading market players will be exposed and analyzed providing insight into how the market is likely to develop. This information can be key for decision makers who need to place their bets today and make decisions that will effect their companies future.

  Back
 
Keywords:
Cloud Visualization, Media & Entertainment, Mobile Summit, Remote Graphics & Cloud-Based Graphics, GTC 2013 - ID S3593
Streaming:
Download:
Clusters & GPU Management
Presentation
Media
Alex Ramirez (Barcelona Supercomputing Center)
The HPC community is always on the lookout for increased performance and energy efficiency. Recently, this led to a growing interest in GPU computing and in clusters built from low-power energy efficient parts from the embedded and mobile market ...Read More

The HPC community is always on the lookout for increased performance and energy efficiency. Recently, this led to a growing interest in GPU computing and in clusters built from low-power energy efficient parts from the embedded and mobile markets. See a developed first proof of concept for a hybrid compute platform that brings together an ARM multicore CPU for energy efficiency, and a discrete GPU accelerator that provides the compute performance. This talk presents the architecture of the system, the system software stack, preliminary performance and power measurements, and concludes with guidelines for future ARM+GPU platforms.

  Back
 
Keywords:
Clusters & GPU Management, Supercomputing, GTC 2013 - ID S3064
Streaming:
Download:
 
John Paul Walters (University of Southern California Information Sciences Institute)
Learn how to deploy heterogeneous, GPU-enabled private clouds through OpenStack. In this session we describe the latest HPC features for the OpenStack cloud computing platform. These features target the OpenStack Grizzly release, the successor t ...Read More

Learn how to deploy heterogeneous, GPU-enabled private clouds through OpenStack. In this session we describe the latest HPC features for the OpenStack cloud computing platform. These features target the OpenStack Grizzly release, the successor to OpenStack Folsom and include heterogeneity-aware scheduling, bare-metal provisioning for non-virtualizable architectures, and multi-hypervisor GPU/CUDA support based on LXC and Xen. A particular focus of this work is to enable high performance signal and image processing in the cloud. Performance results will be shown through a series of examples, demonstrating the impact of Xen vs. LXC on GPU performance for both regular and irregular computations. The session will conclude with a discussion of the next steps in HPC OpenStack development.

  Back
 
Keywords:
Clusters & GPU Management, Cloud Computing, Desktop & Application Virtualization, Signal Processing, GTC 2013 - ID S3214
Streaming:
Download:
 
Craig Idler (Los Alamos National Laboratory), Phil Romero (Los Alamos National Laboratory), Laura Monroe (Los Alamos National Laboratory)
Hear supercomputer acceptance testers explain how to test your new cluster to obtain a highly performing, well balanced cluster by identifying weak nodes. We will describe our experiences in testing supercomputing clusters equipped with GPUs, ho ...Read More

Hear supercomputer acceptance testers explain how to test your new cluster to obtain a highly performing, well balanced cluster by identifying weak nodes. We will describe our experiences in testing supercomputing clusters equipped with GPUs, how they differ from CPU only clusters, finding tests that can discriminate performance levels and how to segregate weak performers. We will discuss the wide variety of tests utilized and identify tests most useful in determining/segregating weak performing nodes/components. Also discussed will be experiences in tuning the High Performance Linpack to obtain maximum performance.

  Back
 
Keywords:
Clusters & GPU Management, Supercomputing, GTC 2013 - ID S3248
Streaming:
Download:
 
Dale Southard (NVIDIA)
Introduction to deploying, managing, and using GPU clusters. Talk will cover a combination of "lessons learned" and "new features" that are of interest to sites deploying GPU clusters for high-performance computing. ...Read More

Introduction to deploying, managing, and using GPU clusters. Talk will cover a combination of "lessons learned" and "new features" that are of interest to sites deploying GPU clusters for high-performance computing.

  Back
 
Keywords:
Clusters & GPU Management, GTC 2013 - ID S3249
Streaming:
Download:
 
Pradeep Kumar Gupta (NVIDIA)
An overview of designing, deploying, and managing small research prototype GPU clusters for HPC. This talk will focus on describing all building components for a cluster and complete software stack to run and manage it. The emphasis is to build ...Read More

An overview of designing, deploying, and managing small research prototype GPU clusters for HPC. This talk will focus on describing all building components for a cluster and complete software stack to run and manage it. The emphasis is to build a rsearch prototype GPU cluster using all open source Software and with minimal hardware. Learn to build and operate basic GPU computing resources that provide end users with the latest CUDA features.

  Back
 
Keywords:
Clusters & GPU Management, Development Tools & Libraries, GTC 2013 - ID S3516
Streaming:
Download:
 
Marc Hamilton (HP Enterprise Group), Dick Bland (Hewlett-Packard Co.), Jean-Luc Assor (Hewlett-Packard Co.)
Come to this session to learn about the latest innovations for GPU computing and visualization from HP. The new ProLiant Gen8 SL servers and workstation blades will be featured for solutions like accelerated HPC applications, cloud visualization ...Read More

Come to this session to learn about the latest innovations for GPU computing and visualization from HP. The new ProLiant Gen8 SL servers and workstation blades will be featured for solutions like accelerated HPC applications, cloud visualization and virtualized desktops. Real world customer use cases from the manufacturing/engineering and Oil&Gas segments will be highlighted. You will also learn everything you need to get started with a GPU clusters in a single, easy-to-use HP GPU cluster starter kit.

  Back
 
Keywords:
Clusters & GPU Management, Cloud Visualization, GTC 2013 - ID S3536
Streaming:
Download:
 
Saeed Iqbal (Dell Inc.)
The latest Kepler based GPUs are very powerful parallel processors. These GPUs are capable of providing a quantum leap in performance across the broad HPC application spectrum. However to fully, realize these gains it is important to design bala ...Read More

The latest Kepler based GPUs are very powerful parallel processors. These GPUs are capable of providing a quantum leap in performance across the broad HPC application spectrum. However to fully, realize these gains it is important to design balanced systems and we will discuss different system-level considerations for various use cases. We will utilize HPL to analyze performance and power consumption at a system level as well as compare to the previous generation GPUs as applicable to highlight the improvements. The goal is to provide the audience information and best practices to design GPU enabled systems using the Kepler GPUs considering parameters such as power consumption, system size and system-level features.

  Back
 
Keywords:
Clusters & GPU Management, Supercomputing, GTC 2013 - ID S3556A
Streaming:
Download:
 
Saeed Iqbal (Dell Inc.)
The latest Kepler based GPUs are very powerful parallel processors. These GPUs are capable of providing a quantum leap in performance across the broad HPC application spectrum. However to fully, realize these gains it is important to design bala ...Read More

The latest Kepler based GPUs are very powerful parallel processors. These GPUs are capable of providing a quantum leap in performance across the broad HPC application spectrum. However to fully, realize these gains it is important to design balanced systems and we will discuss different system-level considerations for various use cases. We will utilize HPL to analyze performance and power consumption at a system level as well as compare to the previous generation GPUs as applicable to highlight the improvements. The goal is to provide the audience information and best practices to design GPU enabled systems using the Kepler GPUs considering parameters such as power consumption, system size and system-level features.

  Back
 
Keywords:
Clusters & GPU Management, Supercomputing, GTC 2013 - ID S3556B
Streaming:
Download:
 
Chris Porter (Platform Computing, an IBM Company)
Achieving real-world performance is about much more than just the raw-performance of underlying hardware. Much as a highly efficient power plant connected to a distribution network losing 70% of its power in transmission makes little sense, the ...Read More

Achieving real-world performance is about much more than just the raw-performance of underlying hardware. Much as a highly efficient power plant connected to a distribution network losing 70% of its power in transmission makes little sense, the same applies to HPC clusters as well â  efficiency matters. While many factors impact efficiency, this session focuses on the critical role of scheduling and workload management in getting the most out of your GPU cluster. By "working smarter", and enabling GPU clusters with dramatically higher utilization and throughput, not only can organizations achieve savings in infrastructure and management costs, they can boost productivity as well.

  Back
 
Keywords:
Clusters & GPU Management, GTC 2013 - ID S3578
Streaming:
Download:
Collaborative & Large Resolution Displays
Presentation
Media
Andrew Page (NVIDIA), Kenji Kato (NASA Ames, Dell Federal), Rajeev Surati, Ph.D. (Scalable Display Technologies), Doug Traill (NVIDIA)
Join a panel of NVIDIA experts and leading companies developing multi-display systems for an interactive discussion on the current trends in scaling the resolution of display walls. Panelists will share their insights on how they are pushing the ...Read More

Join a panel of NVIDIA experts and leading companies developing multi-display systems for an interactive discussion on the current trends in scaling the resolution of display walls. Panelists will share their insights on how they are pushing the state of the art with NVIDIA''s professional display technologies.

  Back
 
Keywords:
Collaborative & Large Resolution Displays, Large Scale Data Visualization & In-Situ Graphics, Manufacturing Technical, GTC 2013 - ID S3052
Streaming:
Download:
 
Doug Traill (NVIDIA)
Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displa ...Read More

Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displays. Attendees of this tutorial will learn how to configure Quadro Graphics for thin bezel panel, edge-blended projectors, stereoscopic and immersive displays.

  Back
 
Keywords:
Collaborative & Large Resolution Displays, Large Scale Data Visualization & In-Situ Graphics, Manufacturing Technical, GTC 2013 - ID S3053
Streaming:
Download:
 
Howard Kaplan (University of South Florida)
This talk will focus on the creation and utilization of the University of South Florida''s ultra-high resolution, stereoscopic 3D visualization display wall. In this session we will describe how universities can benefit from low cost vis ...Read More

This talk will focus on the creation and utilization of the University of South Florida''s ultra-high resolution, stereoscopic 3D visualization display wall. In this session we will describe how universities can benefit from low cost visualization systems, hardware and software evaluation of displays, GPU technologies and use applications for academic settings. An inspection of current trends and future developments in GPU resources in the area of HPC and visualization in academics will be reviewed. We will also explore hardware and software technologies that allow flexible utilization for academic and research purposes.

  Back
 
Keywords:
Collaborative & Large Resolution Displays, Large Scale Data Visualization & In-Situ Graphics, Manufacturing Technical, GTC 2013 - ID S3068
Streaming:
Download:
 
Rajeev Surati (Scalable Display Technologies), Bei Wang (Walt Disney)
The NVIDIA Warp and Blend API has enabled a slew of cost effective scalable visualization systems. We will discuss two different applications: making a seamless edge blended desktop with Scalable Desktop, and making 3D (both projected, and stere ...Read More

The NVIDIA Warp and Blend API has enabled a slew of cost effective scalable visualization systems. We will discuss two different applications: making a seamless edge blended desktop with Scalable Desktop, and making 3D (both projected, and stereoscopic 3d) Virtual Reality and Simulation systems using multiple computers with Scalable Display Manager. We will give several real life examples including a 140 megapixel stereoscopic 3d cave, the SpaceX 16 megapixel control room display. Lastly Bei Wang of Disney will follow up with current progress on the VESA standard effort for Warping and Blending.

  Back
 
Keywords:
Collaborative & Large Resolution Displays, Architectural Mapping & Event Visualization, Combined Simulation & Real-Time Visualization, GTC 2013 - ID S3114
Streaming:
Download:
 
Andy Boud (ImmersaView), Alex Streit (ImmersaView)
This session takes an insightful look into new solutions for streaming high resolution and ultra-high resolution video over IP. How do you send, record and review ultra-high resolution data using GPU techniques? Do these software techniques offe ...Read More

This session takes an insightful look into new solutions for streaming high resolution and ultra-high resolution video over IP. How do you send, record and review ultra-high resolution data using GPU techniques? Do these software techniques offer a new approach to how we work with video? We share some of our experiences in this field.

  Back
 
Keywords:
Collaborative & Large Resolution Displays, Large Scale Data Visualization & In-Situ Graphics, Scientific Visualization, GTC 2013 - ID S3161
Streaming:
Download:
Combined Simulation & Real-Time Visualization
Presentation
Media
Anne C Elster (Norwegian University of Science & Technology)
Learn how a simulation that combines nicely with graphics can be used as a visual test-bed for numerical algorithms, terrain interactions, road planning and more. This presentation includes the techniques and methods behind our 3D snow simulatio ...Read More

Learn how a simulation that combines nicely with graphics can be used as a visual test-bed for numerical algorithms, terrain interactions, road planning and more. This presentation includes the techniques and methods behind our 3D snow simulation that calculates how 4+ million particles are affected by the wind field and terrain in real-time by harnessing the compute power of modern GPUs . Our snow simulator is also being combined with ray tracing techniques for more realistic lighting and snow flake rendering as well as the A* search algorithm which is used to suggest how to map future roads to the terrain based on a set of criteria. We are also experimenting with adding SPH and other fluid techniques to simulate avalanches etc. Stereoscopic output is achieved by taking advantage of the features provided by NVIDIA''s Quadro card.

  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, Computational Physics, GTC 2013 - ID S3060
Streaming:
Download:
 
Tom True (NVIDIA), Alina Alt (NVIDIA)
Workstation applications today demand a tightly coupled compute-graphics pipeline where the simulation and the graphics are done interactively and in parallel. The use of multiple GPUs provides an affordable way for such applications to improve ...Read More

Workstation applications today demand a tightly coupled compute-graphics pipeline where the simulation and the graphics are done interactively and in parallel. The use of multiple GPUs provides an affordable way for such applications to improve their performance and increase their useable data size by partitioning the processing and subsequent visualization among multiple GPUs. This tutorial explains the methodologies of how to program your application for a multi-GPU environment. Part 1 of this tutorial will cover GPU resources allocation and system configuration, including: What to expect when you add additional GPUs to your system; How to select, query and allocate all the necessary GPU resources; Provide a rudimentary introduction into the use of profiling and analysis tools. Throughout this tutorial, simple OpenGL and CUDA examples designed for a single GPU will be modified to efficiently work in a multi-GPU environment.

  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, Graphics Performance Optimization, Media & Entertainment, GTC 2013 - ID S3070
Streaming:
Download:
 
Shalini Venkataraman (NVIDIA), Wil Braithwaite (NVIDIA)
Workstation applications today demand a tightly coupled compute-graphics pipeline where the simulation and the graphics are done interactively and in parallel. The use of multiple GPUs provides an affordable way for such applications to improve ...Read More

Workstation applications today demand a tightly coupled compute-graphics pipeline where the simulation and the graphics are done interactively and in parallel. The use of multiple GPUs provides an affordable way for such applications to improve their performance and increase their useable data size by partitioning the processing and subsequent visualization among mulitple GPUs. This tutorial explains the methodologies of how to program your application for a multi-GPU environment. Part 2 of this tutorial will cover programming methodologies, including: How to structure an application to optimize compute and graphics performance and manage synchronization; How to manage data transfers across the PCIE bus; Debugging and profiling; Programming considerations when scaling beyond two GPUs - multiple compute GPUs feeding to one or multiple graphics GPUs. Throughout this tutorial, simple OpenGL and CUDA examples designed for a single GPU will be modified to efficiently work in a multi-GPU environment.

  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, Graphics Performance Optimization, Media & Entertainment, GTC 2013 - ID S3072
Streaming:
Download:
 
Duksu Kim (Korea Advanced Institute of Science and Technology)
This session will introduce a novel, optimization-based workload distribution algorithm that exploits heterogeneous systems to accelerate various proximity queries. To represent complicated performance relationships between computing resources a ...Read More

This session will introduce a novel, optimization-based workload distribution algorithm that exploits heterogeneous systems to accelerate various proximity queries. To represent complicated performance relationships between computing resources and different computations of proximity queries, we propose a simple model that measures the expected running time of these computations. Based on this model, we formulate an optimization problem that minimizes the largest time spent on computing resources, and propose a novel, iterative LP-based scheduling algorithm. We apply our method into various proximity queries used in five different applications that have different characteristics. Our method achieves an order of magnitude performance improvement by using four different GPUs and two hexa-core CPUs over using a hexa-core CPU only. In addition, we integrate our expected running time model with a work stealing method and achieve 16% performance improvement on average over the basicl work stealing method.

  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, GTC 2013 - ID S3166
Streaming:
Download:
Computational Chemistry
Presentation
Media
Julien Demouth (NVIDIA)
GROMACS is a state-of-the-art molecular simulation package that employs extensive multi-level heterogeneous parallelization. Our new CUDA-based algorithms provide 4x speedup over handtuned CPU SIMD assembly, and unprecedented absolute performanc ...Read More

GROMACS is a state-of-the-art molecular simulation package that employs extensive multi-level heterogeneous parallelization. Our new CUDA-based algorithms provide 4x speedup over handtuned CPU SIMD assembly, and unprecedented absolute performance. However, the heterogeneity of hardware and the inherent bottlenecks involved make efficient resource utilization and strong scaling very challenging. This advanced session describes our recent efforts on multi-level load-balancing, kernel execution strategies, CPU-GPU work splitting, and ways to exploit Kepler features such as Hyper-Q. Join us to talk about current limits of GPU acceleration in MD, and how to take molecular dynamics simulations to 100 millisecond/iteration, equivalent to 10,000 fps, in the near future!

  Back
 
Keywords:
Computational Chemistry, Supercomputing, GTC 2013 - ID S3011
Streaming:
Download:
 
Christian Trott (Sandia National Laboratories)
The session will present implementation and optimization strategies for molecular dynamics many-body potentials. It will concentrate on the new SNAP potential for LAMMPS, which is based on the GAP bispectrum analysis of Bartok et al. [PRL 104, 1 ...Read More

The session will present implementation and optimization strategies for molecular dynamics many-body potentials. It will concentrate on the new SNAP potential for LAMMPS, which is based on the GAP bispectrum analysis of Bartok et al. [PRL 104, 136403 (2010)]. SNAP is fit to large amounts of quantum-based DFT data and is capable of reproducing the accuracy of DFT while still exhibiting linear scaling with the system size. By exploiting multiple parallelisation layers it is possible to mitigate its high cost of 500,000 flops per interaction through excellent strong scaling behaviour down to 16 atoms per GPU. Thus the achievable time to solution on GPU clusters using SNAP is comparable to running simple Lennard Jones simulations.

  Back
 
Keywords:
Computational Chemistry, GTC 2013 - ID S3080
Streaming:
Download:
 
John Stone (University of Illinois at Urbana-Champaign)
This talk will present recent successes in the use of GPUs to accelerate interactive molecular visualization and analysis tasks on hardware platforms ranging from commodity desktop computers to the latest Cray XK7 supercomputers. The talk will f ...Read More

This talk will present recent successes in the use of GPUs to accelerate interactive molecular visualization and analysis tasks on hardware platforms ranging from commodity desktop computers to the latest Cray XK7 supercomputers. The talk will focus on recent algorithm algorithm developments and the applicability and efficient use of new CUDA features on state-of-the-art Kepler GPUs. Will present the latest performance results for GPU accelerated trajectory analysis runs on the Blue Waters Cray XK7 and other GPU-accelerated HPC platforms, and conclude with a discussion of ongoing work and future opportunities for GPU acceleration, particularly as applied to the analysis of petascale simulations of large biomolecular complexes and long simulation timescales.

  Back
 
Keywords:
Computational Chemistry, Large Scale Data Visualization & In-Situ Graphics, GTC 2013 - ID S3097
Streaming:
Download:
 
Vijay Pande (Stanford University)
This session will present recent results from Molecular Dynamics simulations Folding@home, discussing both schemes for parallelization on thousands to millions of GPUs as well as how these simulations have had an impact in basic biophysics and b ...Read More

This session will present recent results from Molecular Dynamics simulations Folding@home, discussing both schemes for parallelization on thousands to millions of GPUs as well as how these simulations have had an impact in basic biophysics and biomedical science, with an emphasis on protein folding and Alzheimer''s Disease.

  Back
 
Keywords:
Computational Chemistry, Supercomputing, GTC 2013 - ID S3140
Streaming:
Download:
 
Susan Mniszewski (Los Alamos National Laboratory)
This session will demonstrate how GPUs were used to accelerate the primary computational bottleneck in explicitly quantum mechanical reactive molecular dynamics simulations in the open-source code LATTE. Focusing on implementations on single and ...Read More

This session will demonstrate how GPUs were used to accelerate the primary computational bottleneck in explicitly quantum mechanical reactive molecular dynamics simulations in the open-source code LATTE. Focusing on implementations on single and multi-GPU architectures of a remarkably simple algorithm for the computation of the density matrix in electronic structure theory that is based on a recursive series of generalized matrix-matrix multiplications. Utilizing CUDA and CUBLAS, resulted not only in significantly faster code, but also density matrices with numerical errors smaller than those obtained from traditional CPU-based algorithms. Real-world applications and timings computed using GPU-accelerated LATTE will be presented.

  Back
 
Keywords:
Computational Chemistry, GTC 2013 - ID S3195
Streaming:
Download:
 
Michela Taufer (University of Delaware), Sandeep Patel (University of Delaware)
With the plethora of future applications of carbon nanotube materials rapidly being realized and exploited, we are pursuing fundamental studies of structural, dynamic, and energetic properties of model single-walled carbon nanotubes in pure wate ...Read More

With the plethora of future applications of carbon nanotube materials rapidly being realized and exploited, we are pursuing fundamental studies of structural, dynamic, and energetic properties of model single-walled carbon nanotubes in pure water and in aqueous solutions of simple inorganic salt, sodium chloride (NaCl) and sodium iodide (NaI). Our transformative research is supported and made possible because of a hybrid combination of resources at Oak Ridge National Lab such as the GPU cluster Keeneland for FEN ZI GPU molecular dynamics simulations of mean force calculations and the data-intensive cluster Nautilus for the data analysis of the GPU-computed potentials of mean force. In this talk we dive deep into the various key aspects of CNT simulations on hybrid resources. Come and learn some of the underlying challenges and get the latest solutions devised to tackle both algorithmic and scientific challenges of CNT simulations and their heterogeneous workflows with GPUs.

  Back
 
Keywords:
Computational Chemistry, Algorithms & Numerical Techniques, GTC 2013 - ID S3199
Streaming:
Download:
 
Samuli Hakala (Aalto University School of Science and Technology)
The goal of this session is to present the design and capabilities of GPU-accelerated GPAW, a density-functional theory (DFT) code based on grid based projector-augmented wave method. It''s suitable for large scale electronic structure c ...Read More

The goal of this session is to present the design and capabilities of GPU-accelerated GPAW, a density-functional theory (DFT) code based on grid based projector-augmented wave method. It''s suitable for large scale electronic structure calculations and capable of scaling to thousands of cores. We''ll discuss how we have accelerated the most computationally intensive components of the program with CUDA. We''ll provide detailed performance and scaling analysis of our multi-GPU-accelerated code staring from small systems up to systems with few thousands atoms running on large GPU clusters with over 200 GPUs. We''ve achieved up to 15 times speed-ups on large systems.

  Back
 
Keywords:
Computational Chemistry, GTC 2013 - ID S3206
Streaming:
Download:
 
Scott LeGrand (Amazon Web Services)
In 2008, NVIDIA demonstrated that CUDA-enabled GPUs accelerated molecular dynamics calculations by nearly 3 orders of magnitude compared to traditional CPUs. This allowed a single GPU to achieve the performance of a supercomputer at this task. A ...Read More

In 2008, NVIDIA demonstrated that CUDA-enabled GPUs accelerated molecular dynamics calculations by nearly 3 orders of magnitude compared to traditional CPUs. This allowed a single GPU to achieve the performance of a supercomputer at this task. Additionally, performance has improved by 1.5x to 2x per GPU generation. Despite these obvious benefits, there is still entrenched resistance to porting many existing codes to GPUs because of the work involved in doing so. However, with 5 years of performance data now in the rear-view mirror, it is clear that not only is it of huge benefit to port to GPUs now, but also that failing to do so will only result in having to do so later when many-core architectures become the standard. Finally, given you have already ported your code to GPUs, the next logical step is make your code cloud-accessible, freeing your users from having to purchase any hardware whatsoever and allowing them to take advantage of exponentially improving performance.

  Back
 
Keywords:
Computational Chemistry, Cloud Visualization, GTC 2013 - ID S3228
Streaming:
Download:
 
Joshua Anderson (University of Michigan)
Monte Carlo and Molecular Dynamics simulations are standard tools for analyzing the thermodynamic and statistical behavior of many-particle systems. The first computer experiment performed for the Manhattan project was a simulation of 12 hard sp ...Read More

Monte Carlo and Molecular Dynamics simulations are standard tools for analyzing the thermodynamic and statistical behavior of many-particle systems. The first computer experiment performed for the Manhattan project was a simulation of 12 hard spheres using a Monte Carlo algorithm. Now, massive parallelism enables routine simulations of millions of particles. In this talk, we describe our novel GPU Monte Carlo algorithm and compare it with HOOMD-blue, our open-source Molecular Dynamics code. Recent improvements to HOOMD-blue make possible parallel multiple GPU simulations on workstations and clusters. Applications include polymer dynamics, granular materials, non-equilibrium systems, and hard particle self-assembly.

  Back
 
Keywords:
Computational Chemistry, Computational Physics, GTC 2013 - ID S3251
Streaming:
Download:
 
James Phillips (University of Illinois)
The highly parallel molecular dynamics code NAMD was chosen in 2006 as a target application for the NSF petascale supercomputer now know as Blue Waters. NAMD was also one of the first codes to run on a GPU cluster when G80 and CUDA were introduc ...Read More

The highly parallel molecular dynamics code NAMD was chosen in 2006 as a target application for the NSF petascale supercomputer now know as Blue Waters. NAMD was also one of the first codes to run on a GPU cluster when G80 and CUDA were introduced in 2007. How do the GPU-accelerated Cray XK6 Blue Waters and ORNL Titan machines compare to CPU-based platforms for a hundred-million-atom Blue Waters acceptance test? Come learn the opportunities and pitfalls of taking GPU computing to the petascale and the importance of CUDA 5 and Kepler features in combining multicore host processors and GPUs in a legacy message-driven application.

  Back
 
Keywords:
Computational Chemistry, Supercomputing, GTC 2013 - ID S3272
Streaming:
Download:
 
Erik Lindahl (KTH Royal Institute of Technology at Stockholm University)
Learn how to perform molecular dynamics simulations reaching microsecond-per-day performance on GPUs, how to achieve impressive GPU acceleration of a code that was already extremely hand-tuned for x86 CPUs, and how we hope to take it even furthe ...Read More

Learn how to perform molecular dynamics simulations reaching microsecond-per-day performance on GPUs, how to achieve impressive GPU acceleration of a code that was already extremely hand-tuned for x86 CPUs, and how we hope to take it even further in the future. GROMACS is one of the most widespread programs in the world to simulate biomolecular dynamics, and has long been accelerated for CPUs with handtuned assembly code. This session will cover our challenges and successes in achieving significantly higher absolute performance with CUDA in GROMACS compared to extremely tuned CPU code both on low-end systems and massively parallel supercomputers. Join us to learn about the overall architectural decisions and features of this heterogeneous multi-level parallelization, see examples of application performance, and participate in a discussion about how future molecular simulation needs to focus on efficient throughput and sampling to achieve scaling.

  Back
 
Keywords:
Computational Chemistry, Supercomputing, GTC 2013 - ID S3283
Streaming:
Download:
 
Szilard Pall (KTH Royal Institute of Technology)
GROMACS is a state-of-the-art molecular simulation package that employs extensive multi-level heterogeneous parallelization. Our new CUDA-based algorithms provide 4x speedup over handtuned CPU SIMD assembly, and unprecedented absolute performanc ...Read More

GROMACS is a state-of-the-art molecular simulation package that employs extensive multi-level heterogeneous parallelization. Our new CUDA-based algorithms provide 4x speedup over handtuned CPU SIMD assembly, and unprecedented absolute performance. However, the heterogeneity of hardware and the inherent bottlenecks involved make efficient resource utilization and strong scaling very challenging. This advanced session describes our recent efforts on multi-level load-balancing, kernel execution strategies, CPU-GPU work splitting, and ways to exploit Kepler features such as Hyper-Q. Join us to talk about current limits of GPU acceleration in MD, and how to take molecular dynamics simulations to 100 millisecond/iteration, equivalent to 10,000 fps, in the near future!

  Back
 
Keywords:
Computational Chemistry, Supercomputing, GTC 2013 - ID S3288
Streaming:
Download:
 
Brian Cole (OpenEye Scientific Software)
ROCS (Rapid Overlay of Chemical Structure) is a proprietary algorithm that helped build OpenEye as a pillar of molecular modeling software. This was due to ROCS being very fast on the CPU and its robustness as a scientific model. Porting the alg ...Read More

ROCS (Rapid Overlay of Chemical Structure) is a proprietary algorithm that helped build OpenEye as a pillar of molecular modeling software. This was due to ROCS being very fast on the CPU and its robustness as a scientific model. Porting the algorithm to OpenCL achieved over a 100x speed improvement. What has been the effect after 3 years of experience on the market? And why was it ported to CUDA? What is the true value of speed? And are there other ways to achieve it?

  Back
 
Keywords:
Computational Chemistry, Databases, Data Mining, Business Intelligence, Supercomputing, GTC 2013 - ID S3328
Streaming:
Download:
 
Thanasis Anthopoulos (Cardiff University)
The present session refers to a haptic Protein - Ligand docking (HPLD) application developed in the Molecular Modelling Lab of the Cardiff School of Pharmacy. The talk aims to describe in detail how GPUs enable the application to run with a full ...Read More

The present session refers to a haptic Protein - Ligand docking (HPLD) application developed in the Molecular Modelling Lab of the Cardiff School of Pharmacy. The talk aims to describe in detail how GPUs enable the application to run with a fully flexible ligand and protein target. The first part of the talk describes the algorithm used to perform the MMFF94s force-field energy and force calculations. Performance benchmarks will be presented to show the speed-up gained from the presented CUDA algorithms. The second part of the talk refers to how asynchronous stream processing helped to provide smooth visual rendering as well as force feedback on the haptic device at a rate of 1000Hz. The session closes by showing how flexible HPLD improves docking results during simulations.

  Back
 
Keywords:
Computational Chemistry, GTC 2013 - ID S3333
Streaming:
Download:
 
Ross Walker (University of California, San Diego)
This talk will focus on the impact that GPUs have had on Molecular Dynamics (MD) Simulations. In particular it will highlight the massive performance improvements that GPUs have brought to MD simulations with AMBER. Kepler based solutions can ro ...Read More

This talk will focus on the impact that GPUs have had on Molecular Dynamics (MD) Simulations. In particular it will highlight the massive performance improvements that GPUs have brought to MD simulations with AMBER. Kepler based solutions can routinely provide simulation rates exceeding 100ns/day on a single GPU in a single desktop while replica exchange approaches to accelerating convergence enable hundreds of GPUs to be employed in parallel. The GPU revolution has transformed the MD landscape. No longer is access to supercomputer resources required to routinely access microsecond timescales and beyond. The world of MD research is now flat, with all researchers, young and old, rich and poor being able to run simulations that previously were restricted to those privileged enough to have routine access to supercomputers. This has made it an exciting time for research involving Molecular Dynamics.

  Back
 
Keywords:
Computational Chemistry, GTC 2013 - ID S3380
Streaming:
Download:
 
Ryan Olson (Cray)
The distributed shared-memory implementation of the coupled-cluster singles and doubles with perturbative triples algorithm, CCSD(T), in the GAMESS chemistry package was ported to the GPU using the directive-based OpenACC standard. The focus of ...Read More

The distributed shared-memory implementation of the coupled-cluster singles and doubles with perturbative triples algorithm, CCSD(T), in the GAMESS chemistry package was ported to the GPU using the directive-based OpenACC standard. The focus of this port was to achieve maximum strong-scaling performance for small molecular systems (

  Back
 
Keywords:
Computational Chemistry, Development Tools & Libraries, Parallel Programming Languages & Compilers, GTC 2013 - ID S3506
Streaming:
Download:
Computational Fluid Dynamics
Presentation
Media
Gopal Patnaik (Naval Research Laboratory)
In this session we investigate the performance of a mixture of CPU and GPU codes on a multi-CPU, multi-GPU cluster. This cluster attempts to balance I/O, GPU, and CPU performance to accommodate a wide variety of codes. The Jet Engine Noise Reduc ...Read More

In this session we investigate the performance of a mixture of CPU and GPU codes on a multi-CPU, multi-GPU cluster. This cluster attempts to balance I/O, GPU, and CPU performance to accommodate a wide variety of codes. The Jet Engine Noise Reduction (JENRE) code implements a compressible flow solver for the simulation of supersonic jet flow and its acoustic properties. The JENRE code''s performance using GPUs is currently 3.4 times that with CPUs. The cluster is also used for a variety of jobs that utilize the CPU only, and the GPUs are left idle. This leads to significant under-utilization of the computational resources. We examine the overall utilization of the cluster and performance of a mix of CPU codes with the GPU-based JENRE code running simultaneously on the same nodes. A careful, cooperative scheduling of jobs can result in a tripling of the computational capability of the cluster.

  Back
 
Keywords:
Computational Fluid Dynamics, Clusters & GPU Management, Manufacturing Technical, GTC 2013 - ID S3034
Streaming:
Download:
 
Yoshiaki Hanada (Prometech Software, Inc.)
Get the latest information of Particle-based fluid simulation on Kepler GPU with an example of Japanese commercial CAE software named "Particleworks". Provided in this session is information such as particle simulation trends in CAE, i ...Read More

Get the latest information of Particle-based fluid simulation on Kepler GPU with an example of Japanese commercial CAE software named "Particleworks". Provided in this session is information such as particle simulation trends in CAE, implementation and performance comparison between CPU, Fermi GPU and Kepler GPU (including performance gain breakdown for each functions) and showcase a more efficient simulation based design with Particleworks in Japanese industry. Prometech Software, Inc. has been developing particle simulation technology on GPU since 2007, when NVIDIA announced the first CUDA release. Provide will be a summary of technical development along with above topics.

  Back
 
Keywords:
Computational Fluid Dynamics, Manufacturing Technical, GTC 2013 - ID S3063
Streaming:
Download:
 
Kjetil Olsen Lye (SINTEF)
What happens when a wave crashes over the bow of a boat? In this presentation we give a short overview of the smoothed-particles hydrodynamics (SPH) method, and use it to simulate this. Using CUDA to accelerate an industrial SPH simulator, we ge ...Read More

What happens when a wave crashes over the bow of a boat? In this presentation we give a short overview of the smoothed-particles hydrodynamics (SPH) method, and use it to simulate this. Using CUDA to accelerate an industrial SPH simulator, we get real-world results at unprecedented speeds. Shows a new parallelization scheme for the SPH method on the GPU, and present an simple, yet efficient, autotuning method for CUDA kernels. We show how you can effectively design the implementation around the memory layout of a GPU, and how the new Kepler architecture can provide additional speedup.

  Back
 
Keywords:
Computational Fluid Dynamics, Algorithms & Numerical Techniques, Manufacturing Technical, GTC 2013 - ID S3154
Streaming:
Download:
 
Richard Smith (Symscape)
Explore the issues associated with the retrofit of a highly optimized, legacy MPI CFD system with GPU acceleration. The CFD system in question is OpenFOAM, a well-regarded and popular open source finite-volume CFD framework. The GPU acceleration ...Read More

Explore the issues associated with the retrofit of a highly optimized, legacy MPI CFD system with GPU acceleration. The CFD system in question is OpenFOAM, a well-regarded and popular open source finite-volume CFD framework. The GPU acceleration uses CUDA via the CUSP and THRUST open source frameworks. Learn the reasoning behind the choices of open source frameworks and the process of integrating the GPU linear solvers, while retaining the original CFD system''s functionality and structure. The final GPU library source code - ofgpu - is freely available under the GPL for Windows and Linux. Finally see how the same GPU capability is provided within a user friendly, interactive CFD environment called Caedium.

  Back
 
Keywords:
Computational Fluid Dynamics, Manufacturing Technical, GTC 2013 - ID S3181
Streaming:
Download:
 
John Humphrey (EM Photonics, Inc.)
Explore new developments in the area of high performance computational fluid dynamics. As fluid dynamics solvers are commonly run on supercomputers, there is an ever-growing need for enhanced speed and power. Widely used solvers such as FUN3D an ...Read More

Explore new developments in the area of high performance computational fluid dynamics. As fluid dynamics solvers are commonly run on supercomputers, there is an ever-growing need for enhanced speed and power. Widely used solvers such as FUN3D and AVUS have been under development for over 20 years, which poses unique challenges in refitting these codes for GPU execution because the codes were originally designed for single-core CPUs. In this session, we will explore those challenges, our experience in working with these codes, and the results that can be obtained. We will also delve into a whole new architecture for fluids solvers that expresses parallelism in a manner that befits both GPU acceleration as well as full-system hybrid execution that fully utilizes multiple GPUs and all available CPU cores.

  Back
 
Keywords:
Computational Fluid Dynamics, Manufacturing Technical, Supercomputing, GTC 2013 - ID S3186
Streaming:
Download:
 
Takayuki Aoki (Tokyo Institute of Technology / Global Scientific Information and Computing Center)
Turbulent modeling is a key issue of CFD (Computational Fluid Dynamics), since most flow phenomena become turbulent with higher Reynolds number. We have developed a CFD code based on Lattice Boltzmann Method with a LES (Large-Eddy Simulation) mo ...Read More

Turbulent modeling is a key issue of CFD (Computational Fluid Dynamics), since most flow phenomena become turbulent with higher Reynolds number. We have developed a CFD code based on Lattice Boltzmann Method with a LES (Large-Eddy Simulation) model. The dynamic Smagorinsky model is often used, however it requires average operations for wide area to determine the model constant. Due to the huge overhead for large-scale computation, we applied the coherent-structure Smagorinsky model which is able to determine the model constant locally. We study a turbulent flow behind a non-rotating football and air flows for a metropolitan area 10km x 10km with 1-m resolution on the ground shape based on the real building data.

  Back
 
Keywords:
Computational Fluid Dynamics, Supercomputing, GTC 2013 - ID S3210
Streaming:
Download:
 
Alexander Monakov (Institute for System Programming of RAS), Arutyun Avetisyan (Institute for System Programming of RAS)
Learn about optimizations that significantly improve performance of our CUDA conjugate gradient linear solver developed for OpenFOAM, a popular open-source CFD software toolbox. We describe the challenges present in porting iterative solvers to ...Read More

Learn about optimizations that significantly improve performance of our CUDA conjugate gradient linear solver developed for OpenFOAM, a popular open-source CFD software toolbox. We describe the challenges present in porting iterative solvers to CUDA: overhead from data structures conversion, the need for a fast GPU preconditioner, and our approaches to tackling them. We explain our optimizations: reusing the preconditioner from previous time-steps, always storing the preconditioner in low precision, etc., and their impact on performance. Finally, we show how our implementation handles solving in parallel when the number of MPI processes per node exceeds the number of GPUs.

  Back
 
Keywords:
Computational Fluid Dynamics, Algorithms & Numerical Techniques, Manufacturing Technical, GTC 2013 - ID S3220
Streaming:
Download:
 
Mark Mawson (University of Manchester)
Join us as we demonstrate 2D and 3D GPU based fluid-structure interaction software jointly developed at the University of Manchester in the UK and CIEMAT in Madrid, Spain. Presenting a fluid solver based on the Lattice Boltzmann Method (LBM) tha ...Read More

Join us as we demonstrate 2D and 3D GPU based fluid-structure interaction software jointly developed at the University of Manchester in the UK and CIEMAT in Madrid, Spain. Presenting a fluid solver based on the Lattice Boltzmann Method (LBM) that is algorithmically optimized for GPU, allowing for a peak rate of 270 million points to be updated per second, and real-time user interaction with the fluid flow. Demonstrating a GPU immersed boundary method (IBM) that allows our LBM to interact with both rigid and flexible bodies, with a speedup in 2D of 78 compared with a sequential CPU implementation.

  Back
 
Keywords:
Computational Fluid Dynamics, Manufacturing Technical, GTC 2013 - ID S3270
Streaming:
Download:
 
Peter Zaspel (University of Bonn, Germany)
Join the presentation of our latest results in multi-GPU parallel numerical methods for uncertainty quantification in computational fluid dynamics. After a short outline of the applied multi-GPU parallel flow solver for incompressible two-phase ...Read More

Join the presentation of our latest results in multi-GPU parallel numerical methods for uncertainty quantification in computational fluid dynamics. After a short outline of the applied multi-GPU parallel flow solver for incompressible two-phase flows NaSt3DGPF and a non-intrusive stochastic collocation method, we will dive into some of the important numerical methods such as large-scale iterative multi-GPU eigenvalue solvers, GPU-parallel algebraic multigrid methods and efficient sparse grid / multi-level Monte-Carlo methods. These will eventually allow us to perform the optimal and efficient computation of stochastic data including expectation values, variance and covariance from flow problems with uncertain or varying input data.

  Back
 
Keywords:
Computational Fluid Dynamics, Algorithms & Numerical Techniques, Manufacturing Technical, GTC 2013 - ID S3409
Streaming:
Download:
 
Joe Dutka (Acer Inc.)
The session will share the work of the bicycle company Velocite and researchers at NCKU in Taiwan as they used GPU computing to design their next generation of bicycles. The platform was 2 Acer AT350 F2 servers with both Quadro and Tesla cards a ...Read More

The session will share the work of the bicycle company Velocite and researchers at NCKU in Taiwan as they used GPU computing to design their next generation of bicycles. The platform was 2 Acer AT350 F2 servers with both Quadro and Tesla cards and Intel Xeon CPUs for hybrid computation. Rather than a theoretical presentation, the session will focus on real-world implementation and the difficulties overcome to develop software and bike design. Taking place at the same time, the bike will debut in the Taipei Bike Show on March 20.

  Back
 
Keywords:
Computational Fluid Dynamics, Computer Aided Design, Manufacturing Technical, GTC 2013 - ID S3502
Streaming:
Download:
Computational Physics
Presentation
Media
Dan Negrut (University of Wisconsin-Madison)
This talk will explore the use of heterogeneous CPU/GPU computing, as enabled by an in-house developed Heterogeneous Computing Template (HCT), for physics-based simulations of mechanical systems. HCT draws on five components: advanced physics-ba ...Read More

This talk will explore the use of heterogeneous CPU/GPU computing, as enabled by an in-house developed Heterogeneous Computing Template (HCT), for physics-based simulations of mechanical systems. HCT draws on five components: advanced physics-based modeling techniques (formulating the relevant equations governing the physics of interest); algorithmic support (solving these equations); proximity computation and collision detection; domain decomposition/data exchange (for multi-node distributed CPU/GPU computing); and post-processing/visualization. Example applications will include granular terrain simulation, tracked and wheeled vehicle mobility studies (tanks, Mars Rover, etc.), fluid-solid interaction analysis, and nonlinear finite element analysis. The talk will demonstrate the multiple aspects in which GPU computing has fundamentally impacted modeling and simulation in Mechanical Engineering.

  Back
 
Keywords:
Computational Physics, Algorithms & Numerical Techniques, Computational Structural Mechanics, Manufacturing Technical, GTC 2013 - ID S3003
Streaming:
Download:
 
Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf), Guido Juckeland (Technische Universitaet Dresden)
With PIConGPU, new physics phenomena previously not accessible within laser plasma simulations can be studied, which will help us optimize laser-driven radiation sources. Presents results on laser wakefield acceleration of electrons simulated on ...Read More

With PIConGPU, new physics phenomena previously not accessible within laser plasma simulations can be studied, which will help us optimize laser-driven radiation sources. Presents results on laser wakefield acceleration of electrons simulated on the Oakridge TITAN system and discuss in detail which techniques help us to get the most out of these clusters. Finally showing how to add fault-tolerance and load-balancing to a large hybridh CPU-GPU code such as PIConGPU to achieve optimum performance.

  Back
 
Keywords:
Computational Physics, Supercomputing, GTC 2013 - ID S3026
Streaming:
Download:
 
Karthik Murthy (Rice University)
This talk presents an overview of the implementation of the particle pusher which targets NVIDIA GPUs by extending a novel energy- and charge- conserving 1D electrostatic particle pushing algorithm to a 2D electromagnetic version. Energy is cons ...Read More

This talk presents an overview of the implementation of the particle pusher which targets NVIDIA GPUs by extending a novel energy- and charge- conserving 1D electrostatic particle pushing algorithm to a 2D electromagnetic version. Energy is conserved by using a fully implicit time integration, and particles are carefully treated at cell boundaries to maintain charge conservation. The momentum in the system is controlled by an adaptive orbit integrator that compares a first and second order integration scheme. Implementation is based on the CUDA 4.1 framework. Implementation effectively exploits the memory hierarchy on the GPU by employing the texture memory to access the electric and magnetic fields, and the shared memory to accumulate the charge and current density before a global accumulation. Evaluating a red-black scheduling scheme of CUDA blocks to reduce contention while global accumulation. Effectively utilize multiple GPUs to perform computation for different species of particles. Showcases the CUDA implementation via a two species (ion, electron) plasma physics application where the particles are in equilibrium.

  Back
 
Keywords:
Computational Physics, Parallel Programming Languages & Compilers, GTC 2013 - ID S3144
Streaming:
Download:
 
Mathias Wagner (University of Bielefeld)
Discover how data from experiments at heavy-ion colliders (the Relativistic Heavy Ion Collider at Brookhaven National Lab and the Large Hadron Collider at CERN) can immediately be compared with first-principles simulations of Quantum Chromodynam ...Read More

Discover how data from experiments at heavy-ion colliders (the Relativistic Heavy Ion Collider at Brookhaven National Lab and the Large Hadron Collider at CERN) can immediately be compared with first-principles simulations of Quantum Chromodynamics (QCD) to quantitatively probe the fundamental properties of strongly interacting matter, i.e., quarks and gluons at high temperature. The conditions realized in the experiments governed the early evolution of the universe. The necessary high precision for these comparisons is obtained by completely performing our calculations on the GPU. In doing so we simultaneously face a low flop/byte ratio and high-register pressure. See how we deal with these complications and achieve high performance on the Bielefeld GPU cluster with 400 Fermi GPUs.

  Back
 
Keywords:
Computational Physics, GTC 2013 - ID S3153
Streaming:
Download:
 
William Brouwer (Research Computing and Cyberinfrastructure Unit, The Pennsylvania State University), Sreejith G J (NORDITA - Nordic Institute for Theoretical Physics), Filippo Spiga (Quantum Espresso Foundation)
Lanczos diagonalization (LD) is an important algorithm for calculating eigenvalues and eigenvectors of large matrices, used in many applications. A example performed countless times throughout the world on a daily basis involves latent semantic ...Read More

Lanczos diagonalization (LD) is an important algorithm for calculating eigenvalues and eigenvectors of large matrices, used in many applications. A example performed countless times throughout the world on a daily basis involves latent semantic analysis of documents for search and retrieval. This presentation details work devoted to exploiting the massive parallelism and scalability of GPUs, in order to enhance LD for key aspects of condensed matter physics. One significant application area is the diagonalization of the Hamiltonian for large, dense matrices encountered in studies of the fractional quantum Hall effect. A second application discussed in this work is to the Self Consistent Field (SCF) cycle of a Density Functional Theory (DFT) code, Quantum Espresso. Initial results are promising, demonstrating a 18x speedup using GPU, over an optimized CPU implementation. Further, the use of MPI in conjunction with NVIDIA GPUDirect allows for scaling to all GPUs across the cluster used in this work.

  Back
 
Keywords:
Computational Physics, Algorithms & Numerical Techniques, GTC 2013 - ID S3201
Streaming:
Download:
 
Rob Aulwes (Los Alamos National Laboratory)
Monte Carlo methods for photonic transport pose a challenge to accelerating on the GPUs. The random nature of the methods makes it difficult to predict branching behavior of threads in order to reduce branch divergence within thread warps. Trans ...Read More

Monte Carlo methods for photonic transport pose a challenge to accelerating on the GPUs. The random nature of the methods makes it difficult to predict branching behavior of threads in order to reduce branch divergence within thread warps. Transport methods use continuous energy sampling, further complicating our effort to run on the GPU since these methods use rejection sampling techniques. Then compare performance between a baseline implementation on a single CPU core and an NVIDIA Quadro 4000 GPU card on a quad-core Intel Xeon Mac OS 10.6.8. The test problem was simplified to transporting in a single material inside a sphere with an isotropic source using random energies from 1 KeV to 5 MeV. Then investigate performance impacts of branch divergence and discuss our results.

  Back
 
Keywords:
Computational Physics, GTC 2013 - ID S3260
Streaming:
Download:
 
Valerie Halyo (Princeton University)
Significant new challenges are continuously confronting the High Energy Physics (HEP) experiments in particular the Large Hadron Collider (LHC) at CERN which pushes the limits of computing. We propose a new tracking algorithm to be executed on a ...Read More

Significant new challenges are continuously confronting the High Energy Physics (HEP) experiments in particular the Large Hadron Collider (LHC) at CERN which pushes the limits of computing. We propose a new tracking algorithm to be executed on a hybrid CPU/GPU computer farm that will improve the purity and efficiency of existing triggers while allowing the inclusion of new triggers that will allow selection of events with signatures of physics beyond the Standard Model (SM). Our development efforts are focused on two different approaches to computing the Hough transform algorithms. The first is a more traditional approach of using one thread per hit and computing for every theta value the corresponding value of rho. A second algorithmic approach is based on parallelism based not on hits but on the points in the Hough transform space. Each thread, processing one point in the Hough space, computes the contributions from every hit. However, it has a computational cost that increases significantly with increasing resolution for the Hough space but may allow for lower resolutions and better robustness for noise and lines that are not exactly straight. Preliminary results for the first approach show that the performance on a NVIDIA Tesla C2075 is 25 - 35x better than single threaded CPU performance. The second approach shows more impressive speedups, ranging from 50 - 100x when compared to single threaded CPU performance, but is still almost 10x slower than the more conventional approach.

  Back
 
Keywords:
Computational Physics, Algorithms & Numerical Techniques, GTC 2013 - ID S3263
Streaming:
Download:
 
Felice Pantaleo (European Organization for Nuclear Research), Vincenzo Innocente (European Organization for Nuclear Research)
In the field of high energy physics, several groups are pursuing the use of GPUs for data analysis and for Monte Carlo simulations of particle interactions. The use of GPUs presented in this seminar is different: GPUs are employed for taking dec ...Read More

In the field of high energy physics, several groups are pursuing the use of GPUs for data analysis and for Monte Carlo simulations of particle interactions. The use of GPUs presented in this seminar is different: GPUs are employed for taking decisions in a trigger system, both as coprocessors in high level software trigger or "embedded" in real-time, fixed-latency hardware trigger.

  Back
 
Keywords:
Computational Physics, Astronomy & Astrophysics, GTC 2013 - ID S3278
Streaming:
Download:
 
Frank Winter (Jefferson Lab)
QDP++ (Data Parallel C++ Interface for QCD) provides data-parallel operations and data types appropriate to lattice gauge theory that enable efficient writing of C++ lattice QCD programs. QDP-JIT provides a QDP++ implementation with support for ...Read More

QDP++ (Data Parallel C++ Interface for QCD) provides data-parallel operations and data types appropriate to lattice gauge theory that enable efficient writing of C++ lattice QCD programs. QDP-JIT provides a QDP++ implementation with support for GPU-enabled parallel systems. Compute kernels are generated on demand interfacing NVIDIA''s just-in-time compute compile driver. GPU memory management is automated freeing the domain scientists from this task.

  Back
 
Keywords:
Computational Physics, Parallel Programming Languages & Compilers, GTC 2013 - ID S3280
Streaming:
Download:
 
Abhinav Sarje (Lawrence Berkeley National Laboratory)
In this session, we will report efforts and experiences in developing high-performance parallel algorithms and codes on large-scale GPU clusters for analysis of the large amounts of data generated by present high-throughput synchrotron light-sou ...Read More

In this session, we will report efforts and experiences in developing high-performance parallel algorithms and codes on large-scale GPU clusters for analysis of the large amounts of data generated by present high-throughput synchrotron light-sources. Such analyses are used in the characterization of macromolecules and particle-systems at micro/nano-scales. Codes include multi-GPU accelerated implementations for X-ray scattering pattern simulation using Distorted Wave Born Approximation theory, and structural fitting of such patterns through inverse modeling using Reverse Monte Carlo simulation algorithm. These codes are designed to be architecture-aware, and deliver high-performance through dynamic selection of the best-performing computational parameter values, such as computation decomposition parameters and block sizes, for the GPU architecture being used. Discussed will be detailed performance analyses and optimizations of codes.

  Back
 
Keywords:
Computational Physics, Computational Chemistry, Supercomputing, GTC 2013 - ID S3282
Streaming:
Download:
 
Alessandro Lonardo (Istituto Nazionale di Fisica Nucleare)
The goal of this session is to show you how to build a low-latency, real-time GPU-based stream processing system by describing the strategies and solutions that we have adopted for the design of the GPU-based CERN NA62 experiment L0 trigger. In ...Read More

The goal of this session is to show you how to build a low-latency, real-time GPU-based stream processing system by describing the strategies and solutions that we have adopted for the design of the GPU-based CERN NA62 experiment L0 trigger. In this context we have to deal with signal extraction from a background that is ten orders of magnitude more frequent. Combining the computing power of Fermi GPUs and the NVIDIA GPUDirect P2P capabilities of the NaNet custom FPGA-based NIC, turned out to be a very effective solution, capable of reducing the event rate to sustainable levels within the strict experiment timing requirements. Finally, we will provide an outline of the project future activities, enabled by the introduction of CUDA 5 and the Kepler family devices.

  Back
 
Keywords:
Computational Physics, Signal Processing, GTC 2013 - ID S3286
Streaming:
Download:
 
Andreas Schafer (Friedrich-Alexander-University Erlangen-Nurnberg)
Unleash your simulation codes on any CUDA-capable machine with the help of LibGeoDecomp. Available Library for Geometric Decomposition Codes ( http://www.libgeodecomp.org ) is an auto-parallelizing framework to aid domain scientists in porting t ...Read More

Unleash your simulation codes on any CUDA-capable machine with the help of LibGeoDecomp. Available Library for Geometric Decomposition Codes ( http://www.libgeodecomp.org ) is an auto-parallelizing framework to aid domain scientists in porting their custom codes to HPC systems. It takes over tedious tasks like domain decomposition via MPI, data transfer to/from the GPU, parallel IO, and live visualization. In short: it allows you to focus on your simulation model. The library provides a smooth upgrade path for existing programs, which keeps the required code changes at a minimum, often just a couple lines of source code. And it scales: be it a light notebook or one of the fastest supercomputers on Earth, e.g. Tsubame 2.0, LibGeoDecomp runs fast, everywhere. On the basis of an example code, see how you can make use of the library in your own projects and which performance gains are to be expected.

  Back
 
Keywords:
Computational Physics, Development Tools & Libraries, GTC 2013 - ID S3299
Streaming:
Download:
 
Nicholas Henderson (Stanford University)
Learn about a CUDA adaptation of Geant4, a large-scale Monte Carlo particle physics toolkit. Geant4 was originally designed to support the needs of high energy physics experiments at SLAC, CERN and other places around the world. Geant4 is an ext ...Read More

Learn about a CUDA adaptation of Geant4, a large-scale Monte Carlo particle physics toolkit. Geant4 was originally designed to support the needs of high energy physics experiments at SLAC, CERN and other places around the world. Geant4 is an extensive toolkit which facilitates every aspect of the simulation process and has been successfully used in many other domains. Current interest is radiation therapy dosimetry. For this application the geometry is simple and the model physics is limited to low energy electromagnetics. These features allow efficient tracking of many particles in parallel on the GPU.

  Back
 
Keywords:
Computational Physics, Algorithms & Numerical Techniques, GTC 2013 - ID S3302
Streaming:
Download:
 
Stanley Seibert (University of Pennsylvania)
This talk will describe the design and development of Chroma, a Python package for fast Monte Carlo simulation of individual optical photons propagating through particle physics experiments. Chroma implements standard ray-tracing techniques with ...Read More

This talk will describe the design and development of Chroma, a Python package for fast Monte Carlo simulation of individual optical photons propagating through particle physics experiments. Chroma implements standard ray-tracing techniques with Python and PyCUDA to provide a versatile, fast, and physically-accurate optical model that is more than 100x faster at photon propagation than the standard particle physics simulation package, GEANT4. Chroma was initially developed by a small academic team of only two people and will discuss lessons learned in the development process and the impact of Python and PyCUDA on scientist-developers.

  Back
 
Keywords:
Computational Physics, Parallel Programming Languages & Compilers, Ray Tracing, GTC 2013 - ID S3304
Streaming:
Download:
 
Ye Fang (Louisiana State University), Sheng Feng (Louisiana State University)
The goal of this session is to demonstrate the optimization and tuning methods for the parallel tempering Monte Carlo simulation of random frustrated spin systems which usually have long relaxation time. Some of the optimization efforts include ...Read More

The goal of this session is to demonstrate the optimization and tuning methods for the parallel tempering Monte Carlo simulation of random frustrated spin systems which usually have long relaxation time. Some of the optimization efforts include new data layouts; memory-hierarchy awared table look-up; kernel granularity; tiling. Designed and tuned particularly on the NVIDIA GPUs, our code is able to achieve 38 picosecond per spin flip, which makes the code the fastest implementation on GPUs so far.

  Back
 
Keywords:
Computational Physics, GTC 2013 - ID S3313
Streaming:
Download:
 
Wei Ge (Institute of Process Engineering, Chinese Academy of Sciences)
Crystalline silicon is a fundamental material for IT and green energy industries. Pyrolysis of silane to silicon which deposites onto the seeds in circulating fluidized bed reactors may bring a revolution for its greener production. Unfortunatel ...Read More

Crystalline silicon is a fundamental material for IT and green energy industries. Pyrolysis of silane to silicon which deposites onto the seeds in circulating fluidized bed reactors may bring a revolution for its greener production. Unfortunately, its commercial application is limited by our poor understanding on its complicated hydrodynamics and reaction kinetics. A multiscale simulation to this process, from reactors to reactions, is carried out using petascale CPU-GPU hybrid computing. The molecular dynamics simulations using Tersoff family potential are carried out for gaseous silane molecules and interfacial silicon atoms by multi-threads on multi-core CPUs, while other silicon atoms are computed on GPUs with fixed neighbor-list, reaching petaflops sustainable performance. Direct numerical simulation for the gas flow around suspended silicon powders are then carried out on GPUs, coupling lattice Boltzmann method with immersed moving boundary, while the collisions among the powders are processed on CPUs with discrete element method (DEM). 1 million solid particles in 2D and 100 thousand particles in 3D with about 1 billion lattices are computed using up to 672 GPUs, which is by far the largest scale for gas-solid systems now, and enters, for the first time, the scale-independent range where intrinsic constitutive correlations can be obtained. The whole reactor is finally simulated on GPUs in coarse-grained DEM, while Navier-Stokes equation is solved for the silane flow with coarse grids on CPUs. The simulation has revealed unprecedented details of the silicon production process which is most valuable to its scaling-up and optimization.

  Back
 
Keywords:
Computational Physics, Computational Chemistry, Computational Fluid Dynamics, GTC 2013 - ID S3363
Streaming:
Download:
 
Lorena Barba (Boston University)
Using a treecode in CUDA for a scientific application shows the power of combining a fast algorithm and GPU hardware. Our application is biomolecular electrostatics, where the boundary element method is used to solve a Poisson-Boltzmann equation ...Read More

Using a treecode in CUDA for a scientific application shows the power of combining a fast algorithm and GPU hardware. Our application is biomolecular electrostatics, where the boundary element method is used to solve a Poisson-Boltzmann equation. With a user-facing Python code, and a fast numerical engine with the CUDA treecode, a user can easily do simulations at scientifically relevant scales on a desktop with a GPU. We will discuss details of the code, present validation benchmarks, show demos with real proteins in physiological conditions (in a salt solution), and invite potential users to try this free and open-source simulation tool.

  Back
 
Keywords:
Computational Physics, GTC 2013 - ID S3394
Streaming:
Download:
 
Balint Joo (US DOE Thomas Jefferson National Accelerator Facility)
Discussed will be advances made in lattice Quantum Chromodynamics simulations on GPUs using the Chroma software system and the QUDA library including on large scale GPU based supercomputers such as the Titan system at the Oak Ridge Leadership Co ...Read More

Discussed will be advances made in lattice Quantum Chromodynamics simulations on GPUs using the Chroma software system and the QUDA library including on large scale GPU based supercomputers such as the Titan system at the Oak Ridge Leadership Computing Facility (OLCF) and the Blue Waters system at the National Center for Supercomputing Applications (NCSA).

  Back
 
Keywords:
Computational Physics, Supercomputing, GTC 2013 - ID S3451
Streaming:
Download:
 
Hyung-Jin Kim (Brookhaven National Lab)
Learn how GPUs can be used to accelerate our understanding of the structure of sub-atomic physics. In this work we deploy GPUs to probe the structure of the nucleus using lattice quantum-chromodynamics (LQCD). While LQCD is the only known method ...Read More

Learn how GPUs can be used to accelerate our understanding of the structure of sub-atomic physics. In this work we deploy GPUs to probe the structure of the nucleus using lattice quantum-chromodynamics (LQCD). While LQCD is the only known method which provides a non-perturbative study of quarks (the particles that make up the nucleus) it requires extremely powerful computational resources to achieve high precision. We accelerate the CPS application (Columbia Physics System, developed by Columbia University, Brookhaven National Laboratory and UKQCD) using the QUDA (QCD on CUDA) library. QUDA is a low-level library built on the CUDA platform, and is designed to accelerate the algorithms that form the basis for LQCD applications. The conjugate gradients (CG) algorithm to solve quark propagators with the 5-d domain-wall Dirac operator is one of the most time consuming part of LQCD calculations, and it is this algorithm that this work focuses on. Running on the Kepler-enabled K20 GPU cluster at Thomas Jefferson National Lab we demonstrate sustained Tera-flops scale CG performance with less than 10 GPUs. Furthermore, we have also developed an alternative 4-d preconditioner for the domain-wall Dirac operator to implement a more efficient version of the Dirac operator (called Mobius). Lastly, we are also exploring the use of eigenvectors using Lanczos algorithm, to further accelerate CG convergence

  Back
 
Keywords:
Computational Physics, Algorithms & Numerical Techniques, Development Tools & Libraries, GTC 2013 - ID S3562
Streaming:
Download:
Computational Structural Mechanics
Presentation
Media
Vukasin Strbac (KULeuven University, Belgium)
This talk will demonstrate an implementation of the Total Lagrangian Explicit Dynamic finite element formulation in CUDA. As this method was originally developed for very soft tissues, it is in full compliance to geometric, material and loading ...Read More

This talk will demonstrate an implementation of the Total Lagrangian Explicit Dynamic finite element formulation in CUDA. As this method was originally developed for very soft tissues, it is in full compliance to geometric, material and loading nonlinearities and achieves significant speedups over industry-proven solutions while retaining accuracy. Intricacies of parallel TLED implementation and comparisons to conventional FE codes are provided as well as a short introduction into the background of this numerical tool. Learn the details and benefits of mathematically reformulating an FE problem towards parallelization and subsequently implementing/optimizing the algorithm in CUDA.

  Back
 
Keywords:
Computational Structural Mechanics, Algorithms & Numerical Techniques, Manufacturing Technical, GTC 2013 - ID S3093
Streaming:
Download:
 
Dan'l Pierce (MSC Software Corp)
This session will describe recent algorithmic and implementation advancements used for real world solution of Noise, Vibration, and Harshness Analysis. The focus will be on the enablement created by the computing horsepower of GPUs. The results ...Read More

This session will describe recent algorithmic and implementation advancements used for real world solution of Noise, Vibration, and Harshness Analysis. The focus will be on the enablement created by the computing horsepower of GPUs. The results presented will be from the most recent commercial offering of MSC Nastran analysis tool suite.

  Back
 
Keywords:
Computational Structural Mechanics, Algorithms & Numerical Techniques, Manufacturing Technical, GTC 2013 - ID S3407
Streaming:
Download:
 
Luis Crivelli (Dassault Systems Simulia Corporation)
Learn the latest development in GPU acceleration of Abaqus Standard and Abaqus Explicit, in particular in the areas of Linear Equation Solvers and Element Formulation. We present results for a novel implementation of the Linear Equation Solver a ...Read More

Learn the latest development in GPU acceleration of Abaqus Standard and Abaqus Explicit, in particular in the areas of Linear Equation Solvers and Element Formulation. We present results for a novel implementation of the Linear Equation Solver and discuss new ideas for accelerating code components that afford little opportunity for data reuse. The need for well thought out data structures to harness the potential of data parallelism will be demonstrated. Finally, how to manage the challenges and risks of adapting large, commercial, production-oriented code to emerging hardware architectures will be briefly discussed.

  Back
 
Keywords:
Computational Structural Mechanics, Manufacturing Technical, Supercomputing, GTC 2013 - ID S3408
Streaming:
Download:
 
Matt Dunbar (Dassault Systems Simulia)
Increased computing capabilities over the past two decades have dramatically changed the landscape of computer aided engineering. As high performance computers have become more broadly available, simulation of structural part behavior, fluid flo ...Read More

Increased computing capabilities over the past two decades have dramatically changed the landscape of computer aided engineering. As high performance computers have become more broadly available, simulation of structural part behavior, fluid flow, and other physics have become commonplace in the engineering workplace. Here we will look back on the changes that have happened in simulation and then look forward to what is anticipated in the coming years.

  Back
 
Keywords:
Computational Structural Mechanics, Manufacturing General, Parallel Programming Languages & Compilers, GTC 2013 - ID S3509
Streaming:
Download:
Computer Aided Design
Presentation
Media
Mehdi Mechaik, PhD. (NVIDIA)
This presentation will show how CUDA acceleration GPU aids in the optimization of printed circuit board (PCB) designs. In particular, it explains how computation of three-dimensional electromagnetic fields and their visualization in PCBs are mad ...Read More

This presentation will show how CUDA acceleration GPU aids in the optimization of printed circuit board (PCB) designs. In particular, it explains how computation of three-dimensional electromagnetic fields and their visualization in PCBs are made computationally efficient and inexpensive using CUDA as compared with conventional CPU Desktop computers. This computational process is important to study field coupling effects in a PCB especially to minimize noise coupling on Wi-Fi antennas. It will also show, for the first time, visualization of steady-state antenna field pattern at a larger distance from a PCB, as is usually done in an anechoic chamber for electromagnetic compatibility testing. (Authors: Mehdi Mechaik of NVIDIA, Davy Pissoort of KU Leuven University, Charles Jackson of NVIDIA, Charlie Shu of NVIDIA, and Henry Zeng of NVIDIA. )

  Back
 
Keywords:
Computer Aided Design, Computational Physics, Manufacturing Technical, GTC 2013 - ID S3295
Streaming:
Download:
 
Tristan Lorach (NVIDIA)
This talk introduces a new approach for compositing shaders and compute kernels together, using an API-agnostic description of effects for objects materials and scene management (post-processing, management of rendering passes). This approach bu ...Read More

This talk introduces a new approach for compositing shaders and compute kernels together, using an API-agnostic description of effects for objects materials and scene management (post-processing, management of rendering passes). This approach builds on the original concepts of NVIDIA CgFX and expands it to new levels of flexibility and extensibility. Rather than creating a new shading language, we show how to supersede existing ones (GLSL, HLSL) to avoid complex parsing and yet deploy the effect in a variety of environments. Because nvFX will be open-sourced, developers will benefit from the engineering approach presented and can leverage its runtime or extend it to service other specific custom requirements.

  Back
 
Keywords:
Computer Aided Design, Game Development, Media & Entertainment, GTC 2013 - ID S3341
Streaming:
Download:
 
Fereydoon Dadkhah (Delphi Electronics & Safety)
This presentation describes Delphi Electronics & Safety''s participation in the NVIDIA Maximus Lighthouse program and summarizes its findings. As part of this program, a Maximus-configured workstation was used in the normal course of ...Read More

This presentation describes Delphi Electronics & Safety''s participation in the NVIDIA Maximus Lighthouse program and summarizes its findings. As part of this program, a Maximus-configured workstation was used in the normal course of activities in the Mechanical Analysis and Simulation group which provides simulation support and expertise to product development teams. The impact of the workstation on workflow and specifically, the impact of the Tesla GPU on accelerating FEA runs were assessed and will be presented.

  Back
 
Keywords:
Computer Aided Design, Computational Structural Mechanics, Manufacturing General, GTC 2013 - ID S3558
Streaming:
Download:
 
Nick Schoeps (Motoczysz)
Learn about the design and development process of the fastest electric motorcycles in the world. In this session, Nick talks about not just the software and hardware tools, but the culture and environment at Motoczysz. We''ll go in depth ...Read More

Learn about the design and development process of the fastest electric motorcycles in the world. In this session, Nick talks about not just the software and hardware tools, but the culture and environment at Motoczysz. We''ll go in depth at what makes a fast and successful engineering team, highlighting where we cut corners and get away with it (and occasionally where we don''t). We''ll wrap up by discussing about how new GPU technology is accelerating our workflow and allowing us to do more with a small team.

  Back
 
Keywords:
Computer Aided Design, Digital Product Design & Styling, Manufacturing General, GTC 2013 - ID S3567
Streaming:
Download:
Computer Vision
Presentation
Media
Chris Slaughter (Lynx Laboratories), Jeff Mahler (Lynx Laboratories)
3D Perception encompasses a set of techniques which infer information and structure of scenes from high-dimensional observations. This talk will discuss the role of the GPU in enabling many components of 3D Perception systems from image processi ...Read More

3D Perception encompasses a set of techniques which infer information and structure of scenes from high-dimensional observations. This talk will discuss the role of the GPU in enabling many components of 3D Perception systems from image processing to 3D modeling. It will also demonstrate several applications of 3D Perception in solving real world problems.

  Back
 
Keywords:
Computer Vision, Robotics & AI, GTC 2013 - ID S30010
Streaming:
Download:
 
Radu Rusu (Open Perception, Inc.), Alexandru Ichim (Open Perception, Inc)
The Point Cloud Library (PCL) is a large scale, open project for 2D/3D image and point cloud processing. The PCL framework contains numerous state-of-the art algorithms including filtering, feature estimation, surface reconstruction, registratio ...Read More

The Point Cloud Library (PCL) is a large scale, open project for 2D/3D image and point cloud processing. The PCL framework contains numerous state-of-the art algorithms including filtering, feature estimation, surface reconstruction, registration, model fitting and segmentation. Due to the massively parallel nature of many of the above algorithms, GPU acceleration holds great potential for achieving real-time performance in numerous applications. In this session we demonstrate some of the recent advances in GPU programming for 3D point cloud processing, obtained in collaboration with our colleagues from NVIDIA, and outline plans for future development.

  Back
 
Keywords:
Computer Vision, Robotics & AI, GTC 2013 - ID S3005
Streaming:
Download:
 
Christopher Geyer (iRobot Corporation), Dr. Stefan Zickler (iRobot Corporation)
Incredible progress has been made in object recognition over the past decade. New algorithms are able to detect generic objects, such as people, cars, trucks, animals, and other common objects, with high accuracy. The best object recognition alg ...Read More

Incredible progress has been made in object recognition over the past decade. New algorithms are able to detect generic objects, such as people, cars, trucks, animals, and other common objects, with high accuracy. The best object recognition algorithms in the world, though, take 10s of seconds to detect one kind of object on a general purpose computer. They cannot run in real-time, preventing them from being applied to improve automotive safety, robot intelligence, or to make novel cellphone apps. In this talk, we describe how a combination of optimizations on CUDA-enabled GPUs, and novel improvements in computer vision algorithms enabled us to accelerate object recognition by two orders of magnitude, enabling for the first time, real-time state-of-the-art object recognition of dozens of object types.

  Back
 
Keywords:
Computer Vision, Video & Image Processing, GTC 2013 - ID S3102
Streaming:
Download:
 
Martin Peniak (Plymouth University)
A long-standing challenge in robotics is the development of a truly robust and general-purpose vision system suitable for object identification, navigation, and other tasks. An unconventional but promising approach for tackling this challenge re ...Read More

A long-standing challenge in robotics is the development of a truly robust and general-purpose vision system suitable for object identification, navigation, and other tasks. An unconventional but promising approach for tackling this challenge relies on the concept of active perception, inspired by the observation that biological organisms interact with the world in order to make sense of it. Presented in this session is an active vision system that was trained to recognize different objects presented during the evolutionary process in 16 different illuminations and 36 different rotation angles. Every neural network controller was able to explore each of these variations in parallel on GPU making the evolutionary process significantly faster than a multi-threaded CPU code. Successfully evolved controllers were able to recognize all the objects within 20 time-steps and preliminary results suggest that this system is tolerant to variations in object rotation, position and scale.

  Back
 
Keywords:
Computer Vision, Robotics & AI, GTC 2013 - ID S3104
Streaming:
Download:
 
Francisco J. Hernandez-Lopez (Center of Research in Mathematics)
This session will present a method for augmenting shaking and shot videos, which is an extension of VScreen: a tool that modifies a region of any video by another image or video in real-time. The technique is initially introduced in the photo au ...Read More

This session will present a method for augmenting shaking and shot videos, which is an extension of VScreen: a tool that modifies a region of any video by another image or video in real-time. The technique is initially introduced in the photo augmentation task, next it is extended for video augmentation and finally it is applied in shaking and shot videos. Moving objects in the foreground (Fg) may occlude the augmented region in background (Bg). So that we use a procedure for Fg/Bg video segmentation, that is implemented in NVIDIA video cards to fulfill the real-time requirement. Finally, we will show a quantitative evaluation, where we compare the precision and time of our binary segmentation method (QMMF) against the Graph Cut method (available in the NPP library).

  Back
 
Keywords:
Computer Vision, Video & Image Processing, GTC 2013 - ID S3147
Streaming:
Download:
 
Shalini Gupta (NVIDIA)
Described will be a novel algorithm for real-time detection ...Read More

Described will be a novel algorithm for real-time detection of rigid textured objects in images that are specifically designed for GPUs (~20x vs. CPU). The popular Viola-Jones cascaded algorithm is efficient for CPUs, but does not port well to GPUs (~3x vs. CPU). Employ multi-block linear binary pattern features and a random forest (RF) classifier. RFs are inherently massively parallel and hence ideal for GPUs. They are also quick to train. Focusing on the classical problem of face detection, we developed an efficient and effective methodology for feature selection and training of accurate RF classifiers for object detection. Describe are algorithmic modifications for an efficient GPU (CUDA) implementation including tabular storage of decision trees to avoid branching, pixel skipping for large images, CPU processing of small images, overlapping of kernel executions and CPU-GPU memory copies, using 16 bit integral images, and a new object confidence measure for improving accuracy. 

  Back
 
Keywords:
Computer Vision, Robotics & AI, GTC 2013 - ID S3297
Streaming:
Download:
 
Andrew Adams (Google)
Halide is a new programming language designed to make it easier to write high-performance image processing code on modern machines. It does this by explicitly separating what computations define the algorithm, from decisions about storage and th ...Read More

Halide is a new programming language designed to make it easier to write high-performance image processing code on modern machines. It does this by explicitly separating what computations define the algorithm, from decisions about storage and the order of computation. We refer to these latter concerns as the schedule, including choices of tiling, fusion, recomputation vs. storage, vectorization, and parallelism. Halide''s high-level but expressive model of scheduling maps closely to many major patterns in GPU programming, including controlling blocking and task granularity, kernel fusion, and storage in global, shared, and local memory. New schedules cannot change the meaning of the algorithm, and the compiler automatically synthesizes whole graphs of kernels and buffer allocations based on the schedule. This provides an extremely high leverage tool for exploring different implementation strategies when programming GPUs.

  Back
 
Keywords:
Computer Vision, Computational Photography, Parallel Programming Languages & Compilers, GTC 2013 - ID S3395
Streaming:
Download:
Databases, Data Mining, Business Intelligence
Presentation
Media
Tobias Lauer (Jedox AG), Steffen Wittmer (Jedox AG)
Learn how CUDA can help speed up business analytics on multidimensional databases. Analytical queries often involve the computation of an extremely large area of aggregated values as input for further processing such as top-k evaluation or other ...Read More

Learn how CUDA can help speed up business analytics on multidimensional databases. Analytical queries often involve the computation of an extremely large area of aggregated values as input for further processing such as top-k evaluation or other filtering. Such an area is usually sparse in the sense that the vast majority (sometimes 99% or more) of its values will be zero. Computing the whole area not only deteriorates performance dramatically but also takes up enormous space that may not be available in GPU global memory. Participants in this session will learn about our CUDA solution to this problem, using features of the Fermi and Kepler architectures for optimization. The approach has also been integrated to the commercially available Jedox Business Intelligence Suite.

  Back
 
Keywords:
Databases, Data Mining, Business Intelligence, GTC 2013 - ID S3088
Streaming:
Download:
 
Tetsuya Uemura (Hitachi Solutions, Ltd.)
The goal of this session is to demystify ISO20022 processing accelerator in GPUs. In financial sectors, GPUs are mainly applied to financial engineering, like pricing of derivatives. We extend the range of application of GPUs to data processing ...Read More

The goal of this session is to demystify ISO20022 processing accelerator in GPUs. In financial sectors, GPUs are mainly applied to financial engineering, like pricing of derivatives. We extend the range of application of GPUs to data processing area and now focus on ISO20022 processing. ISO20022 is a XML-based financial message standard and requires two types of validation; (1) XML schema and (2) message rules, which cause high CPU usage. We have created massively parallel ISO20022 processing algorithms, including these validations and achieve up to 30 times faster performance than that of CPUs.

  Back
 
Keywords:
Databases, Data Mining, Business Intelligence, Finance, GTC 2013 - ID S3125
Streaming:
Download:
 
Brendan Wood (Salesforce.com)
Join us to discover how the Salesforce Marketing Cloud leverages GPUs to provide real-time monitoring of the social web. Real-time monitoring and analysis of social media content requires a high-throughput system capable of processing hundreds o ...Read More

Join us to discover how the Salesforce Marketing Cloud leverages GPUs to provide real-time monitoring of the social web. Real-time monitoring and analysis of social media content requires a high-throughput system capable of processing hundreds of millions of documents per day. Processing involves matching millions of Boolean keyword expressions against each incoming document. We present a GPU-accelerated solution implementing real-time full-text search, which the Salesforce Marketing Cloud has used to vastly reduce the costs of our ingest technology stack. We discuss the implementation of parallel Aho-Corasick for keyword-matching, as well as a novel technique for pruning the expression search space to significantly reduce the computational cost of expression evaluation.

  Back
 
Keywords:
Databases, Data Mining, Business Intelligence, GTC 2013 - ID S3143
Streaming:
Download:
 
Tim Kaldewey (IBM), Rene Mueller (IBM)
Starting with a conventional CPU implementation we identify the most time-consuming operations when processing SQL queries, and show how they can be efficiently offloaded to the GPU. Using queries from a variant of the TPC-H benchmark, we offer ...Read More

Starting with a conventional CPU implementation we identify the most time-consuming operations when processing SQL queries, and show how they can be efficiently offloaded to the GPU. Using queries from a variant of the TPC-H benchmark, we offer a deep dive on how to optimally map complex database operations like join to the GPU hardware, such that they achieve up to 90% hardware efficiency and a throughput of >100M records per second. Given data sets that are orders of magnitude larger than GPU memory, the focus of this talk will be on efficient data layout and movement.

  Back
 
Keywords:
Databases, Data Mining, Business Intelligence, GTC 2013 - ID S3190
Streaming:
Download:
 
Srinivas Reddy (SRIS)
This session will describe a novel architecture used to accelerate the performance of real-time geospatial processing with Storm (a distributed fault-tolerant and real-time computational system), Graphical Processing Units (GPUs) and HyperDex (a ...Read More

This session will describe a novel architecture used to accelerate the performance of real-time geospatial processing with Storm (a distributed fault-tolerant and real-time computational system), Graphical Processing Units (GPUs) and HyperDex (a searchable distributed key-value store). The underlying STORM cluster architecture can handle large number of transactions in the geospatial context utilizing CUDA algorithms running on clustered NVIDIA GPUs and access reference data stored in HyperDex cluster. The user community wants to find out what is happening now or as close to real-time as possible. Our ability to collect data is overwhelming the current capabilities to process as well as perform the computational calculations to turn it into information. Unfortunately, most of the tools in place are primarily working in a batch processing mode for the big data (e.g. Hadoop and other supercomputing platforms). Hence, you need the dedicated processing and computational power that GPUs provide. For the past year, we have been using a GPU cluster for its computational ability (data processing has been reduced from 8 days on a supercomputer to approximately an hour on the GPU cluster). The ability to use GPUs to process large amounts of data in real-time is becoming a necessity as well in the fields of finance and biometrics. In order to facilitate a real-time data flow, the Storm cluster environment was chosen and a decision was made to do all the geospatial processing and computation on a GPU cluster. The algorithms are in CUDA to efficiently access the massively parallel processing provided by modern GPU clusters. The decision to use a HyperDex cluster for the storage and retrieval of reference data was based on its extremely fast searching capability. This is provided via a new sharding technique called hyperspace hashing that is coupled with a new replication protocol called value-dependent chaining. To the best of our knowledge, this is the first efficient approach to accelerate real-time geospatial computations with dynamic entities and events. This platform architecture can also be applied to other areas such as biometrics, finance, etc.

  Back
 
Keywords:
Databases, Data Mining, Business Intelligence, Clusters & GPU Management, Real-Time Graphics Applications, GTC 2013 - ID S3305
Streaming:
Download:
 
Peter Bakkum (Groupon)
Described will be a SQL database research project developed at NEC Laboratories America, capable of executing SELECT queries efficiently on the CPU or the GPU. The talk will discuss the design of the database, the technical challenges encountere ...Read More

Described will be a SQL database research project developed at NEC Laboratories America, capable of executing SELECT queries efficiently on the CPU or the GPU. The talk will discuss the design of the database, the technical challenges encountered in this type of application, and several novel solutions applicable to other GPU projects. Database technology has yet to catch up with available GPU hardware, and we hope to improve understanding of areas where GPUs can be exploited for query processing while still retaining the SQL interface that programmers are familiar with. This talk will include a technical discussion of our architecture and show our benchmarks for certain applications.

  Back
 
Keywords:
Databases, Data Mining, Business Intelligence, GTC 2013 - ID S3332
Streaming:
Download:
Desktop & Application Virtualization
Presentation
Media
Didier Contis (Georgia Institute of Technology)
Desktop and application virtualization can have a strong impact on the delivery of instruction in a higher-education environment. Just-in-time delivery model, elastic capacity, anywhere/anytime/any delivery model are some of the benefits the Geo ...Read More

Desktop and application virtualization can have a strong impact on the delivery of instruction in a higher-education environment. Just-in-time delivery model, elastic capacity, anywhere/anytime/any delivery model are some of the benefits the Georgia Tech College of Engineering has taken advantage of since 2008. Yet, the delivery of CAD/CAM software has remained a challenge due to the lack of virtual GPU. In this session we will share our experience in using some of the current technologies such as RDP 8.0 combined with the NVIDIA GRID K1.

  Back
 
Keywords:
Desktop & Application Virtualization, Remote Graphics & Cloud-Based Graphics, GTC 2013 - ID S3467
Streaming:
Download:
 
Will Wade (NVIDIA)
As enterprises look to move PCs to the cloud, users are more and more demanding of a better experience and support for all of their devices. NVIDIA GRID for Enterprise enables IT managers to deliver on an experience equal to a local PC with all ...Read More

As enterprises look to move PCs to the cloud, users are more and more demanding of a better experience and support for all of their devices. NVIDIA GRID for Enterprise enables IT managers to deliver on an experience equal to a local PC with all the promised benefits of a virtual desktop environment. We''ll show how GRID is being enabled in the most common hypervisors, and talk about the technology behind GPUs in virtual environments.

  Back
 
Keywords:
Desktop & Application Virtualization, Cloud Visualization, Remote Graphics & Cloud-Based Graphics, GTC 2013 - ID S3501
Streaming:
Download:
 
Brian Madden (TechTarget)
For years we've been hearing things like, "the future of Windows enterprise computing is VDI!" Well if that's true, then why is only 2% of the world on VDI right now? Probably because VDI is more expensive than traditional desk ...Read More

For years we've been hearing things like, "the future of Windows enterprise computing is VDI!" Well if that's true, then why is only 2% of the world on VDI right now? Probably because VDI is more expensive than traditional desktop computing while being no more secure or easier to manage, all while delivering a user experience that is, best case, the equivalent of a ten-year-old PC. Does this mean that VDI is doomed? Hardly! Despite the advances in tablet apps  and HTML5 technologies, enterprises still rely on Windows desktop applications, a reliance which is not going away anytime soon. Yet who wants to manage all the Windows "gunk" on the endpoint just to use a few Windows apps? In this energetic session, independent VDI expert Brian Madden will show you the reality of VDI, including why it's failed today and how it will fit into your overall strategy moving forward.

  Back
 
Keywords:
Desktop & Application Virtualization, Remote Graphics & Cloud-Based Graphics, GTC 2013 - ID S3518
Streaming:
Download:
 
Satinder Sethi (Cisco Systems)
Cisco UCS has fundamentally changed the paradigm of traditional computing, and helped evolve the fabric-based computing model. Key Cisco UCS innovations such as Unified Management, Unified Fabric and Unified Computing with virtual machine aware ...Read More

Cisco UCS has fundamentally changed the paradigm of traditional computing, and helped evolve the fabric-based computing model. Key Cisco UCS innovations such as Unified Management, Unified Fabric and Unified Computing with virtual machine aware networking have fundamentally changed the operational dynamics of a data center. Partnering with Nvidia, Cisco is delivering a rich media experience to address the complete spectrum of the application and desktop virtualization use cases, from advanced designers, financial analysts & traders to less graphics intensive contact center workers.

  Back
 
Keywords:
Desktop & Application Virtualization, Design Automation & Production Optimization, Graphics Performance Optimization, Remote Graphics & Cloud-Based Graphics, GTC 2013 - ID S3574
Streaming:
Download:
 
Don Clegg (Supermicro)
As GPU-enabled computing matures, selecting the best hardware platform is more essential than ever. Successful enterprises understand the importance of optimizing compute power and density, I/O bandwidth and latency, plus electrical power-effici ...Read More

As GPU-enabled computing matures, selecting the best hardware platform is more essential than ever. Successful enterprises understand the importance of optimizing compute power and density, I/O bandwidth and latency, plus electrical power-efficiency and cooling to ideally match their intended application within their specified budget. Supermicro, with its industry-leading building-block solutions, delivers the most comprehensive range of GPU-optimized platforms on the market. This presentation, featuring Supermicro''s FatTwinâ¢, Twinâ¢, SuperBlade⢠and rack/tower building blocks will highlight some of the most important architectural innovations to consider when selecting the best GPU platforms for HPC, Cloud/GRID and Workstation applications.

  Back
 
Keywords:
Desktop & Application Virtualization, Computer Aided Design, GTC 2013 - ID S3575
Streaming:
Download:
 
Harrison Hongseo Yun (SK Planet)
The upcoming STB and application virtualization technology replaces the intelligence from the STB to the network, providing new services and advanced UX very efficiently and quickly on the any digital STB or connected devices. Partnering with SK ...Read More

The upcoming STB and application virtualization technology replaces the intelligence from the STB to the network, providing new services and advanced UX very efficiently and quickly on the any digital STB or connected devices. Partnering with SK Broadband, a leading IPTV operator in Korea, SK Planet has successfully launched STB virtualization service, delivering the advanced TV UX, and leveraging TV-centric application store to the legacy STB and ultra-low cost STB.

  Back
 
Keywords:
Desktop & Application Virtualization, Cloud Visualization, GTC 2013 - ID S3586
Streaming:
Download:
Development Tools & Libraries
Presentation
Media
Vyas Venkataraman (NVIDIA)
This tutorial will cover basic CUDA debugging principles and strategies. We will explore common correctness bugs encountered by developers at all levels of experience, and demonstrate how the latest CUDA development tools provide simple, yet pow ...Read More

This tutorial will cover basic CUDA debugging principles and strategies. We will explore common correctness bugs encountered by developers at all levels of experience, and demonstrate how the latest CUDA development tools provide simple, yet powerful, mechanisms to easily locate problematic regions in your code. Learn how to use the features built into Nsight Eclipse Edition, cuda-gdb, and cuda-memcheck to zero in on these errors and reduce debugging time and effort. This session will include a live demonstration using Nsight Eclipse Edition.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3037
Streaming:
Download:
 
Vyas Venkataraman (NVIDIA)
This session will cover basic CUDA debugging principles and strategies. We will explore common correctness bugs encountered by developers at all levels of experience, and demonstrate how the latest CUDA development tools provide simple, yet powe ...Read More

This session will cover basic CUDA debugging principles and strategies. We will explore common correctness bugs encountered by developers at all levels of experience, and demonstrate how the latest CUDA development tools provide simple, yet powerful, mechanisms to easily locate problematic regions in your code. Learn how to use the features built into Nsight Eclipse Edition, cuda-gdb, and cuda-memcheck to zero in on these errors and reduce debugging time and effort.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3038
Streaming:
Download:
 
Przemyslaw Zych (NVIDIA)
NVIDIA Tesla GPUs are supported by a wide array of 3rd-party cluster management and monitoring tools. Through these tools, and through underlying interfaces provided by NVIDIA, system administrators and end users can effectively monitor and mana ...Read More

NVIDIA Tesla GPUs are supported by a wide array of 3rd-party cluster management and monitoring tools. Through these tools, and through underlying interfaces provided by NVIDIA, system administrators and end users can effectively monitor and manage Tesla-based GPU clusters. This talk provides an overview of this ecosystem and will cover topics on thermal and power monitoring, as well as methods for controlling target performance levels, tracking GPU accounting metrics and ECC error management. The second half of the session will focus on GPU health management, including an overview of NVIDIA''s node health checking tool, NVIDIA-Healthmon.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3044
Streaming:
Download:
 
Rolf VandeVaart (NVIDIA)
If you are interested in debugging and profiling of your CUDA applications on a cluster, then this session is for you. The presentation will cover the free tools available from NVIDIA, followed by an introduction to the APIs and frameworks for b ...Read More

If you are interested in debugging and profiling of your CUDA applications on a cluster, then this session is for you. The presentation will cover the free tools available from NVIDIA, followed by an introduction to the APIs and frameworks for building third party tools, and an overview of cluster-class tools from NVIDIA partners. This session will also include a walk-through using the cuda-gdb and nvprof command line tools to profile multiple MPI processes within a node and across different nodes.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3045
Streaming:
Download:
 
David Goodwin (NVIDIA)
Performance optimization is an important part of CUDA application development. This session will explore strategies that you can follow to identify optimization opportunities in your application, and discuss the steps you can take to turn those ...Read More

Performance optimization is an important part of CUDA application development. This session will explore strategies that you can follow to identify optimization opportunities in your application, and discuss the steps you can take to turn those opportunities into actual performance improvement. Using several real-world application examples, the session will show how CUDA profiling tools and technologies enable you to implement these strategies and unlock the full performance of your CUDA application.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3046
Streaming:
Download:
 
Jiri Kraus (NVIDIA)
Always wanted to know what NVIDIA GPUDirect is about and how your MPI+CUDA application can benefit from using it? In this session you will learn how MPI implementations take advantage of GPUDirect technologies to make your applications run faste ...Read More

Always wanted to know what NVIDIA GPUDirect is about and how your MPI+CUDA application can benefit from using it? In this session you will learn how MPI implementations take advantage of GPUDirect technologies to make your applications run faster, including peer-to-peer communication and RDMA. We will introduce several free and commercial CUDA-aware MPI implementations that are available today, ans show and how easy it is to use them. And we will present performance gainsfor real world applications as well as micro benchmarks. If you are working on a MPI+GPU application, then don''t miss this session to learn how NVIDIA GPUDirect and CUDA-aware MPI can give you improved performance, improved usability and better maintainability of your code.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3047
Streaming:
Download:
 
Will Ramey (NVIDIA)
Get a head start on the conference with this introduction to key technologies for GPU Computing. This tutorial will cover the key features of major programming language solutions, libraries and development tools for GPU computing that are availa ...Read More

Get a head start on the conference with this introduction to key technologies for GPU Computing. This tutorial will cover the key features of major programming language solutions, libraries and development tools for GPU computing that are available today. You will also learn which sessions to attend to learn more about each of the topics covered.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3051
Streaming:
Download:
 
Ian Lumb (Allinea Software)
Recent advances in software development and compilers to exploit the power of GPUs are leading to increased interest in the OpenACC programming model. Development tools are the key to success - and Allinea DDT is leading the charge with efficien ...Read More

Recent advances in software development and compilers to exploit the power of GPUs are leading to increased interest in the OpenACC programming model. Development tools are the key to success - and Allinea DDT is leading the charge with efficient and easy to use debugging for this model. This talk will outline the latest advances in debugging support for CUDA - including OpenACC, Dynamic Parallelism and nested kernels, and Kepler 2 - and show how users are using this support to solve challenging software problems.

  Back
 
Keywords:
Development Tools & Libraries, Parallel Programming Languages & Compilers, GTC 2013 - ID S3059
Streaming:
Download:
 
Christopher Rossbach (Microsoft Research Silicon Valley), Jon Currey (Microsoft Research Silicon Valley)
This session considers PTask and Dandelion, two systems which collaborate to provide OS-level abstractions that support GPUs as first-class computing resources while supporting a managed front-end dataflow programming model. With PTask the progr ...Read More

This session considers PTask and Dandelion, two systems which collaborate to provide OS-level abstractions that support GPUs as first-class computing resources while supporting a managed front-end dataflow programming model. With PTask the programmer specifies where data goes, rather than how and when it should get there, allowing the system to provide fairness and isolation guarantees, streamline data movement in ways that currently require direct programmer involvement, and enable code portability across diverse GPU-based platforms. Dandelion builds upon the PTask system to allow the programmer to express algorithms in a managed language using familiar LINQ parallel constructs, automatically generating GPU code and dataflow graph management code from LINQ queries. Our experience building PTask and Dandelion shows that PTask can provide important system-wide guarantees and can enable significant performance benefits, for example improving the throughput of hand-tuned CUDA programs by up to 2x, while Dandelion can enable a performant, nearly GPU-transparent managed front-end programming environment.

  Back
 
Keywords:
Development Tools & Libraries, Parallel Programming Languages & Compilers, GTC 2013 - ID S3079
Streaming:
Download:
 
Mark Silberstein (University of Austin)
Explore the benefits of GPUfs - a library for efficient direct access to the host file system from running GPU kernels. GPUfs provides: (1) a standard file system API, which simplifies development and integration of GPU kernels into complex soft ...Read More

Explore the benefits of GPUfs - a library for efficient direct access to the host file system from running GPU kernels. GPUfs provides: (1) a standard file system API, which simplifies development and integration of GPU kernels into complex software systems (2) GPU buffer cache, which allows efficient execution of applications with complex data reuse (3) Transparent paging to a CPU memory and disk, which enables seamless support for computations whose memory footprint exceeds the GPU physical memory. We explain the GPUfs API with a few simple examples, and then show a number of real applications, including GPU checkpointing and string matching.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3112
Streaming:
Download:
 
Guillermo Marcus (University of Heidelberg, ZITI)
This session will present the Buffer Management Library, a collection of C++ templates to simplify and improve the data transfers between a GPU and a host computer. The library provides multiple known algorithms for data transfers, including chu ...Read More

This session will present the Buffer Management Library, a collection of C++ templates to simplify and improve the data transfers between a GPU and a host computer. The library provides multiple known algorithms for data transfers, including chunk, double and pooled buffers. In addition, the library allows for data transformations to be performed concurrently with the transfers, simplifying the conversion of data formats between host and GPU, as is a common for double to single precision conversions, as well as AOS and SOA arrangements. Using the library reduces significantly the amount of pinned memory required for transfers and removes the limitation of having huge buffers locked for use by the GPU. Overlapping data transformations with data transfers can result in more than 20% performance improvement over separate convert and copy operations.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3160
Streaming:
Download:
 
Chris Gottbrath (Rogue Wave Software Inc)
HRLs Center for Neural and Emergent Systems (CNES) focuses on creating intelligent, efficient machines that can interact with, react and adapt to, evolve, and learn from their environments. Bridging biology and electronics in robotics, intellige ...Read More

HRLs Center for Neural and Emergent Systems (CNES) focuses on creating intelligent, efficient machines that can interact with, react and adapt to, evolve, and learn from their environments. Bridging biology and electronics in robotics, intelligent machines, and neuromorphic systems, CNES is working on a project to allow translation of the brains neuronal, synaptic, network, and system level activities into electronic elements with similar functions. As part of this multi-year cortical simulation project, the team determined that a GPU solution for their performance requirements would provide the best balance between cost and ease of programming and between computation and communication. However, the challenges of parallel debugging with CUDA presented unexpected complications, as debugging tools like printf and assert and command line debugging were not manageable once they scaled up to more than 20 nodes. The CNES team needed a solution that could debug on CUDA in an MPI setting that offered speed, reliable performance, and scalability which they found in Rogue Waves TotalView. This talk will use the HRL case study to teach about specific challenges faced with porting complex applications to NVIDIA GP-GPU accelerated cluster using CUDA and how those challenges were resolved. The presentation of this case study is aimed to help developers feel more confident using CUDA and understand the basics of debugging CUDA and OpenACC in a cluster environment.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3177
Streaming:
Download:
 
Milind Chabbi (Rice University), Karthik Murthy (Rice University), John Mellor-Crummey (Rice University)
Understanding and characterizing performance problems of CPU-GPU programs, as well as providing insightful feedback to help guide programmer towards tuning their applications is critical to improving developer productivity. HPCToolkit is a start ...Read More

Understanding and characterizing performance problems of CPU-GPU programs, as well as providing insightful feedback to help guide programmer towards tuning their applications is critical to improving developer productivity. HPCToolkit is a start-of-the-art performance analysis tool that employs statistical sampling of timers and hardware counters, and attributes performance metrics to the hierarchical calling context. We extend HPCToolkit to measure and attribute performance of hybrid CPU-GPU codes. We present CPU-GPU blame shifting - a technique to identify code regions that underutilize CPU and/or GPU compute resources. We demonstrate the effectiveness of our tools on diverse scientific codes such as hydrodynamics (LULESH), molecular dynamics (LAMMPS), and epidemiology simulation(GPU-EpiSimdemics).

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3256
Streaming:
Download:
 
Wei-Fan Chiang (University of Utah)
Learn how to write high-performance but precise GPU programs by understanding potential floating-point imprecision that your programs could have. Floating-point accuracy is often a neglected issue in GPU program development, but plays a critical ...Read More

Learn how to write high-performance but precise GPU programs by understanding potential floating-point imprecision that your programs could have. Floating-point accuracy is often a neglected issue in GPU program development, but plays a critical role in assuring reliability. In this session, we will describe how performance tuning techniques such as changing floating-point type or changing the underlying algorithm affect floating-point precision. Furthermore, see how to estimate floating-point imprecision as well as how our novel affine-arithmetic-based and control-flow-sensitive analysis can help programmers make informed decision regarding performance/precision trade-off in their programs. These concepts will be illustrated with real examples from public GPU benchmark sets.

  Back
 
Keywords:
Development Tools & Libraries, Parallel Programming Languages & Compilers, GTC 2013 - ID S3309
Streaming:
Download:
 
Ade Miller (Microsoft Corporation)
C++ AMP is Microsoft''s GPU programming technology. This presentation, by one of the authors of "C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++" (MSPress), gives an overview of C++ AMP''s features. T ...Read More

C++ AMP is Microsoft''s GPU programming technology. This presentation, by one of the authors of "C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++" (MSPress), gives an overview of C++ AMP''s features. The presentation will introduce C++ AMP''s algorithms and containers programming model and its two minor additions to the C++ language. By programming against a hardware agnostic data parallel accelerator model, rather than specific hardware, developers can future proof their applications to run on a variety of data parallel hardware. Several C++ AMP examples will be demonstrated, showing the array and array_view container types and the parallel_for_each algorithm. The examples will be extended so show how C++ AMP code can be optimized and then used with the Parallel Patterns Library on the CPU to take advantage of multiple GPUs and achieve further performance improvements with braided parallelism.

  Back
 
Keywords:
Development Tools & Libraries, Parallel Programming Languages & Compilers, GTC 2013 - ID S3317
Streaming:
Download:
 
Michael B. Carter (Siemens Medical Solutions), Jeremy Bennett (Siemens Medical Solutions)
The Direct Model rendering engine is a modern, professional application using a scene graph architecture that pushes the latest OpenGL features. Jeremy Bennett and Michael Carter, from the 3D visualization team at Siemens, will present features ...Read More

The Direct Model rendering engine is a modern, professional application using a scene graph architecture that pushes the latest OpenGL features. Jeremy Bennett and Michael Carter, from the 3D visualization team at Siemens, will present features and rendering effects from their application and share the key challenges they faced while developing this state-of-art, high-performance rendering engine. The engine is driven by a strong design emphasis on Uniform Buffer Objects (UBOs) for handling material properties and Vertex Buffer Objects (VBOs) for the geometry. Sharing what was learned while implementing their new Adaptive Transparency feature. Finally, Jeremy and Michael will demonstrate how they were able to use the latest version of Nsight Visual Studio Edition to debug some of these complex rendering techniques and optimize their OpenGL rendering and the application as a whole.

  Back
 
Keywords:
Development Tools & Libraries, Graphics Performance Optimization, Large Scale Data Visualization & In-Situ Graphics, Manufacturing Technical, GTC 2013 - ID S3376
Streaming:
Download:
 
Sebastien Domine (NVIDIA)
NVIDIA Nsight 3.0 Visual Studio Edition is the latest revision of the most advanced GPU-accelerated application development environment for heterogeneous platforms. S bastien Domin, Sr. Director, Developer Tools, will present the new architectur ...Read More

NVIDIA Nsight 3.0 Visual Studio Edition is the latest revision of the most advanced GPU-accelerated application development environment for heterogeneous platforms. S bastien Domin, Sr. Director, Developer Tools, will present the new architecture and feature set of the product that brings CUDA and OpenGL graphics development together within the same development activities. From Debugging GLSL Graphics Shaders and CUDA kernels within the same GPU debugging session, to optimizing applications making complex use of graphics and compute multi-GPUs, from tracing Compute and Graphics asynchronous memory transfers to and from the GPU, Nsight 3.0 unleashes GPU development to a level of integration never seen before. Sebastien will use real world examples to highlight how the development of such GPU-accelerated applications can be more efficiently developed with these new and future versions.

  Back
 
Keywords:
Development Tools & Libraries, Combined Simulation & Real-Time Visualization, Graphics Performance Optimization, Media & Entertainment, GTC 2013 - ID S3377
Streaming:
Download:
 
Rafael Campana (NVIDIA), Benjamin Keck (Siemens Healthcare)
With the help of CUDA and NVIDIA GPUs, Siemens has been able to accelerate their application solutions, and have utilized Nsight Visual Studio Edition to tune and optimize their CPU & GPU code to gain better performance. In this session, the ...Read More

With the help of CUDA and NVIDIA GPUs, Siemens has been able to accelerate their application solutions, and have utilized Nsight Visual Studio Edition to tune and optimize their CPU & GPU code to gain better performance. In this session, the development team at Siemens will share how they have used Nsight''s features like debugging CUDA kernels, NVIDIA Tools Extension SDK (NVTX), as well as the NVTX text version (NVTXT), Source Code Correlation to squeeze the best performance out of the GPUs.

  Back
 
Keywords:
Development Tools & Libraries, Parallel Programming Languages & Compilers, GTC 2013 - ID S3381
Streaming:
Download:
 
Magnus Strengert (NVIDIA)
Discover how the the analysis features of NVIDIA Nsight Visual Studio Edition can guide you in improving the performance of your CUDA application and your CUDA kernels. Using examples of various code patterns and kernel samples, we will discuss ...Read More

Discover how the the analysis features of NVIDIA Nsight Visual Studio Edition can guide you in improving the performance of your CUDA application and your CUDA kernels. Using examples of various code patterns and kernel samples, we will discuss ways to spot common performance bottlenecks and implement code optimizations based on measured profiling data. Nsight''s new Source-Level Experiments will be used to further dive into analyzing CUDA-C kernel source code - allowing to evaluate performance characteristics from individual CUDA-C source code lines down to each executed assembly instruction.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3382
Streaming:
Download:
 
Pavel Bogdanov (Institute of System Research Russian Academy of Science), Anton Yefremov (Institute of System Research Russian Academy of Science)
This session will introduce a new approach to hybrid cluster programming based on OpenCL API. Main priorities: efficient use of all devices in system and clarity of logic. Some results for a node with 8 GPUs on board will be discussed: dense lin ...Read More

This session will introduce a new approach to hybrid cluster programming based on OpenCL API. Main priorities: efficient use of all devices in system and clarity of logic. Some results for a node with 8 GPUs on board will be discussed: dense linear algebra (LinPack), SpMV, model CFD task (sphere in supersonic flow) and real CFD code.

  Back
 
Keywords:
Development Tools & Libraries, Computational Fluid Dynamics, Parallel Programming Languages & Compilers, GTC 2013 - ID S3410
Download:
 
Ronald Young (Multipath Corporation)
Want to show off the performance of your new GPUs, benchmark various hardware configurations or burn in a system for a mission critical application? MatrixWarrior is the tool for you. Layered on FMSlib and NVIDIA''s cuBLAS library, Matri ...Read More

Want to show off the performance of your new GPUs, benchmark various hardware configurations or burn in a system for a mission critical application? MatrixWarrior is the tool for you. Layered on FMSlib and NVIDIA''s cuBLAS library, MatrixWarrior automatically builds and solves a large matrix, keeping all GPUs, CPUs, memory and disks operating at maximum capacity. Attendees will watch MatrixWarrior perform using Kepler GPUs in the latest workstations and servers. The performance benefits of cuBLAS, asynchronous streams, direct and asynchronous I/O, file striping, pinned memory and memory size will be presented. Attendees are encouraged to download MatrixWarrior, which is available free (http://www.fmslib.com/mkt/MatrixWarrior.shtml) and try it on their own systems from laptops to multiple GPU servers.

  Back
 
Keywords:
Development Tools & Libraries, Supercomputing, GTC 2013 - ID S3412
Streaming:
Download:
 
H. Carter Edwards (Sandia National Laboratories), Daniel Sunderland (Sandia National Lab), Christian Trott (Sandia National Laboratories)
Performance on manycore devices is dependent data access patterns where different devices (NVIDIA, Intel-Phi, NUMA) require different data access patterns. A performance-portable programming model does not force a false-choice between arrays-of- ...Read More

Performance on manycore devices is dependent data access patterns where different devices (NVIDIA, Intel-Phi, NUMA) require different data access patterns. A performance-portable programming model does not force a false-choice between arrays-of-structures or structures-of-arrays, instead it defines abstractions to transparently adapt data structures to meet device requirements. The KokkosArray library implements this strategy through simple and intuitive multidimensional array abstractions. Usability and performance-portability is demonstrated with proxy-applications for finite element and molecular dynamics codes. MiniMD, a proxy-application for the LAMMPS molecular dynamic code, has implementations in OpenMP, OpenCL, CUDA, and now KokkosArray. A comparison of miniMD''s KokkosArray implementation with the previous three versions demonstrate the relative strengths and weaknesses of KokkosArray, and that how the portable version retains about 95% of the performance of the "native" versions. Multiphysics applications with heterogeneous finite element discretizations have complex and highly irregular data structures. A KokkosArray-based prototype unstructured heterogeneous finite element mesh library and its support for heterogeneous manycore parallel computations will be presented.

  Back
 
Keywords:
Development Tools & Libraries, Algorithms & Numerical Techniques, Parallel Programming Languages & Compilers, GTC 2013 - ID S3426
Streaming:
Download:
 
Rafael Campana (NVIDIA)
Overview and live demo of the latest debugging features available in NVIDIA Nsight Visual Studio Edition. Using user scenarios we will discuss ways to use features like the CUDA info, CUDA warp watch, assembly debugging, memory checker and other ...Read More

Overview and live demo of the latest debugging features available in NVIDIA Nsight Visual Studio Edition. Using user scenarios we will discuss ways to use features like the CUDA info, CUDA warp watch, assembly debugging, memory checker and others to find issues that otherwise without Nsight will be difficult to spot.

  Back
 
Keywords:
Development Tools & Libraries, Parallel Programming Languages & Compilers, GTC 2013 - ID S3478
Streaming:
Download:
 
Donald Becker (NVIDIA), Bastiaan Aarts (NVIDIA)
This session will cover the development progress, performance characteristics and future evolution of NVIDIA's CUDA on ARM Architecture. Much progress has been made since the preliminary demonstration of Seco's CARMA DevKit at the 2012 G ...Read More

This session will cover the development progress, performance characteristics and future evolution of NVIDIA's CUDA on ARM Architecture. Much progress has been made since the preliminary demonstration of Seco's CARMA DevKit at the 2012 GTC. We'll cover the existing software environment, show performance and power efficiency results, discuss the software evolution and hint at future hardware platforms.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3493
Streaming:
Download:
 
Drew Robbins (Microsoft), Vladimir Kolesnikov (Microsoft), Nikola Metulev (Microsoft)
Windows 8 is Windows re-imagined, representing today''s single biggest opportunity for developers. Join us for a Windows 8 app workshop to learn how you can take full advantage. You''ll learn from experts in a low-key, interactiv ...Read More

Windows 8 is Windows re-imagined, representing today''s single biggest opportunity for developers. Join us for a Windows 8 app workshop to learn how you can take full advantage. You''ll learn from experts in a low-key, interactive way and then get hands-on time to apply what you''ve learned. Part 1 of this series will focus on the platform for building Windows Store apps and designing a Windows Store app. Attendees will gain an understanding of the platform design tenets, the programming language choices, and the integration points with the operating system and across Windows Store apps. You will learn the design principles behind the Windows Store apps and get insights into how to apply these principles in their own apps. To get the most out of this hands-on lab be sure to bring your Windows 8 laptop with Visual Studio 2012 already installed to follow along and build your first Windows 8 app. A free Windows 8 90-day evaluation download is available at MSDN Evaluation Center. Visual Studio Express 2012 for Windows 8 is available for free at Visual Studio.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3563
Streaming:
Download:
 
Drew Robbins (Microsoft), Vladimir Kolesnikov (Microsoft), Nikola Metulev (Microsoft)
In Part 2 of this four-part series, attendees will receive hands-on training to learn how they can create a Windows Store app (listview and Data Binding) and optimize views (Orientation, Snapping and Semantic Zoom). Windows 8 provides new ready- ...Read More

In Part 2 of this four-part series, attendees will receive hands-on training to learn how they can create a Windows Store app (listview and Data Binding) and optimize views (Orientation, Snapping and Semantic Zoom). Windows 8 provides new ready-to-use user interface controls and this session will guide you on how to use them when implementing common patterns that deliver great Windows 8 Store apps. Learn how to design adaptive layouts that ensure that apps look great across different screen sizes, resolutions and aspect ratios. To get the most out of this hands-on lab be sure to bring your Windows 8 laptop with Visual Studio 2012 already installed to follow along and build your first Windows 8 app. A free Windows 8 90-day evaluation download is available at MSDN Evaluation Center. Visual Studio Express 2012 for Windows 8 is available for free at Visual Studio.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3564
Streaming:
Download:
 
Drew Robbins (Microsoft), Vladimir Kolesnikov (Microsoft), Nikola Metulev (Microsoft)
The Building Windows 8 Store Apps hands-on lab continues in Part 3 of this series with instruction on the Search and Share Contracts, Settings and Preferences features. Windows 8 Store apps use contracts to declare the interactions they support ...Read More

The Building Windows 8 Store Apps hands-on lab continues in Part 3 of this series with instruction on the Search and Share Contracts, Settings and Preferences features. Windows 8 Store apps use contracts to declare the interactions they support with other apps. We will show you how to design the apps so they implement contracts to attract new users or delight existing users by providing them seamless integration. We will also cover and explore how to use Process Lifetime Management. To get the most out of this hands-on lab be sure to bring your Windows 8 laptop with Visual Studio 2012 already installed to follow along and build your first Windows 8 app. A free Windows 8 90-day evaluation download is available at MSDN Evaluation Center. Visual Studio Express 2012 for Windows 8 is available for free at Visual Studio.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3565
Streaming:
Download:
 
Drew Robbins (Microsoft), Vladimir Kolesnikov (Microsoft), Nikola Metulev (Microsoft)
The final hands-on lab of the Building Windows 8 Store Apps series will focus on Live Tiles and Notifications, Windows Azure Mobile Services and DirectX. The app''s Tiles and Notifications keep it front and center increasing the connecti ...Read More

The final hands-on lab of the Building Windows 8 Store Apps series will focus on Live Tiles and Notifications, Windows Azure Mobile Services and DirectX. The app''s Tiles and Notifications keep it front and center increasing the connection with the user. In this session we''ll cover how to include primary and secondary Tiles that are "live" encouraging the user to personalize their Start screen by pinning information from your app. Finally, we''ll show how to write apps that take advantage of the power of DirectX, and how to create DirectX apps that make use of the unique features of Windows 8. To get the most out of this hands-on lab be sure to bring your Windows 8 laptop with Visual Studio 2012 already installed to follow along and build your first Windows 8 app. A free Windows 8 90-day evaluation download is available at MSDN Evaluation Center. Visual Studio Express 2012 for Windows 8 is available for free at Visual Studio.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3566
Streaming:
Download:
 
David Goodwin (NVIDIA), Guido Juckeland (TU Dresden-ZIH), Allen Malony (NVIDIA), Milind Chabbi (Rice University), Stan Tomov (University of Tennessee)
Application profiling allows developers to assess the opportunity for improving application performance using GPUs. Attend this session if you are interested in understanding the CUPTI, and how several popular tools (NVIDIA NSite, TAU, Vampir, P ...Read More

Application profiling allows developers to assess the opportunity for improving application performance using GPUs. Attend this session if you are interested in understanding the CUPTI, and how several popular tools (NVIDIA NSite, TAU, Vampir, PAPI, and HPCToolkit) make use of this profiling library. This will be run as a panel session with good opportunity for audience interaction.

  Back
 
Keywords:
Development Tools & Libraries, GTC 2013 - ID S3584
Streaming:
Download:
Digital Product Design & Styling
Presentation
Media
Paul Silver (Dassault Systemes), Dre Clemons (Dassault Systemes)
The global automotive industry constantly strives for higher, more sophisticated levels of vehicle innovation. Not only are leading OEMs and suppliers tasked with integrating high tech entertainment and performance systems, they must also stay c ...Read More

The global automotive industry constantly strives for higher, more sophisticated levels of vehicle innovation. Not only are leading OEMs and suppliers tasked with integrating high tech entertainment and performance systems, they must also stay closely in sync with their discriminating consumers, who demand more aerodynamic, contemporary vehicle styling as well. More complex virtual vehicle modeling, and graphic-intense rendering requires more progressively advanced graphics processors, with onboard GPU computational power. This session will explore the very latest global industry trends & challenges, then show how Dassault Systemes'' 3DEXPERIENCE Platform and NVIDIA have accelerated global vehicle development with unrivalled capabilities for world-class styling, realistic rendering and advanced virtual simulation.

  Back
 
Keywords:
Digital Product Design & Styling, Automotive, Manufacturing General, GTC 2013 - ID S3392
Streaming:
Download:
Electronic Design Automation
Presentation
Media
Xue-Xin Liu (University of California, Riverside)
This work applies CUDA GPU on linear equation solver in thermal analysis of liquid cooling 3D-IC. Our new thermal model generates finite difference equations, which are solved by iterative solver for better efficiency. The proposed GPU accelerat ...Read More

This work applies CUDA GPU on linear equation solver in thermal analysis of liquid cooling 3D-IC. Our new thermal model generates finite difference equations, which are solved by iterative solver for better efficiency. The proposed GPU accelerated GMRES solver with preconditioner has been proved effective on publicly available and approved test benches.

  Back
 
Keywords:
Electronic Design Automation, Algorithms & Numerical Techniques, Computational Structural Mechanics, Manufacturing Technical, GTC 2013 - ID S3150
Streaming:
Download:
 
Uri Tal (Rocketick Inc.)
HDL (Hardware Description Language) simulators, being event-driven, manage a single queue of events and handle events one at a time, serially. To be able to utilize a massive multi-core architecture, it is not sufficient to even completely rewri ...Read More

HDL (Hardware Description Language) simulators, being event-driven, manage a single queue of events and handle events one at a time, serially. To be able to utilize a massive multi-core architecture, it is not sufficient to even completely rewrite the software; the algorithms must be re-thought with parallelism at the heart of the process. The performance bottleneck of electronic design automation (EDA) applications comes from two directions. First, most of these applications are single-threaded while CPU and GPU architectures have tens to thousands of parallel cores. Secondly, these applications are bottlenecked by memory latency. CPUs are designed for the 90%-100% cache hit working point. Unfortunately, in EDA applications, the dataset is too large to fit in the cache and, with the absence of data-access locality, the "cache-hit" assumption fails. GPU''s are perfectly suited for data-parallel algorithms with huge datasets. All that is required is that you launch several million short-lived independent threads that need not communicate with each other. Sounds easy? Well, if your algorithm naturally breaks into parallel threads, where each thread works on its own different subset of the data, then it is. The bad news is that in most EDA applications not all parts of the flow can be broken into independent parallel threads. From the in-depth experience we gained when developing RocketSim, we came to the conclusion that you must start from a "blank sheet" and rethink the algorithm of running logic simulations. This is the only way we could break the problem into parallel threads.

  Back
 
Keywords:
Electronic Design Automation, Manufacturing Technical, Parallel Programming Languages & Compilers, GTC 2013 - ID S3168
Streaming:
Download:
 
Maxim Naumov (NVIDIA)
In this talk we will introduce the basic concepts behind The Simulation Program with Integrated Circuit Emphasis (SPICE) and discuss in detail the two most time consuming parts of the circuit simulation: the device model evaluation and the solut ...Read More

In this talk we will introduce the basic concepts behind The Simulation Program with Integrated Circuit Emphasis (SPICE) and discuss in detail the two most time consuming parts of the circuit simulation: the device model evaluation and the solution of large sparse linear systems. In particular, we focus on the evaluation of the basic models, such as resistor, capacitor and inductor as well as more complex transistor (BSIM4v7) model on the GPU. Also, we discuss the solution of sets of linear systems that are performed throughout the simulation. We take advantage of the fact that the coefficient matrices in these linear systems have the same sparsity pattern (and often end up with the same pivoting strategy) and show how to obtain their solution using a direct method on the GPU. Finally, we present numerical experiments and discuss future work. Co-authors Francesco Lannutti, Sharanyan Chetlur, Lung Sheng Chien, Philippe Vandermersch.

  Back
 
Keywords:
Electronic Design Automation, Algorithms & Numerical Techniques, GTC 2013 - ID S3364
Streaming:
Download:
Energy Exploration
Presentation
Media
Jens Schneider (King Abdullah University of Science and Technology)
Learn how to improve your visualization application by state-of-the-art compression algorithms fit for high performance decoding on the GPU. See how such GPU-friendly compression algorithms allow more of your data to fit into both host and devic ...Read More

Learn how to improve your visualization application by state-of-the-art compression algorithms fit for high performance decoding on the GPU. See how such GPU-friendly compression algorithms allow more of your data to fit into both host and device memory and how compression leads to better bandwidth utilization and balance between the host and device. Develop a sound understanding what types of compression algorithms avoid sequential execution on the decoder side and benefit optimally from the GPUs parallelism. See successful visualization applications ranging from medical data over particle/point clouds to terrain rendering and why GPU-friendly data compression has been an enabling ingredient to achieve interactive frame rates for terabytes of scientific data.

  Back
 
Keywords:
Energy Exploration, Large Scale Data Visualization & In-Situ Graphics, Real-Time Graphics Applications, Scientific Visualization, GTC 2013 - ID S3098
Streaming:
Download:
 
Jonathan Marbach (TerraSpark Geosciences)
Follow the evolving story - the highs and lows - of our efforts in porting computationally-intensive seismic attributes to the GPU in a commercial seismic interpretation application, and learn how "porting makes you stronger." This ses ...Read More

Follow the evolving story - the highs and lows - of our efforts in porting computationally-intensive seismic attributes to the GPU in a commercial seismic interpretation application, and learn how "porting makes you stronger." This session will discuss our strategies for developing and supporting GPU-based versions of algorithms such as structurally-oriented noise filtering, curvature, and fault enhancement while also ensuring a seamless and robust user-experience. We will also detail the performance benefits and time-saving impact on an interpreter''s workflow the GPU acceleration brings to a typical workstation. The talk will also include a look at our experiences on Fermi and Kepler-based hardware, ranging from high-end to low-end. This session is for both Geoscientists who want to learn about how to speed up their interpretation workflows and for developers or technical managers looking to learn from our experiences in taking GPU acceleration from proof-of-concept to a commercial application.

  Back
 
Keywords:
Energy Exploration, GTC 2013 - ID S3105
Streaming:
Download:
 
Max Grossman (Repsol USA)
This presentation will introduce a port to GPUs of a well-known Seismic Imaging algorithm, Kirchoff Migration. Previously published work in this area has focused on partial portings. In this implementation, all computational sections of the appl ...Read More

This presentation will introduce a port to GPUs of a well-known Seismic Imaging algorithm, Kirchoff Migration. Previously published work in this area has focused on partial portings. In this implementation, all computational sections of the application are executed on the GPUs in multiple nodes. Also, the implementation automatically adapts to the hardware resources available in the allocated compute nodes, thus taking advantage of all GPUs and CPUs. Finally, where appropriate intelligent scheduling of work onto the most computationally efficient processor is used to maximize hardware utilization and application performance.

  Back
 
Keywords:
Energy Exploration, GTC 2013 - ID S3136
Streaming:
Download:
 
Yifeng Cui (University of California at San Diego)
Petascale supercomputers have been used to model earthquake dynamics, one of the most challenging computational problems in science. We have developed a high scalable application AWP-ODC that achieved "M8": a full dynamical simulation ...Read More

Petascale supercomputers have been used to model earthquake dynamics, one of the most challenging computational problems in science. We have developed a high scalable application AWP-ODC that achieved "M8": a full dynamical simulation of a magnitude-8 earthquake on the southern San Andreas fault up to 2 Hz, the largest-ever earthquake simulation conducted up to date and a Gordon Bell finalist in 2010. This finite difference AWP-ODC code has recently been implemented to CUDA-MPI. We present the implementation of this GPU version code for efficient algorithm-level data locality and novel overlapping design of data communications between GPUs. We also report the actual wave propagation simulations whose accuracy has been validated against the original the CPU production code. Further enhancement of this application is under development for hybrid multicore architectures towards petascale earthquake modeling.

  Back
 
Keywords:
Energy Exploration, Supercomputing, GTC 2013 - ID S3162
Streaming:
Download:
 
Igor Podladtchikov (Spectraseis Inc)
The proposed paradigm: Memory-bound algorithms are good. Floating point operation costs become irrelevant when the time to complete a task is only a function of information transfer. Then, the only performance limiters, and therefore targets for ...Read More

The proposed paradigm: Memory-bound algorithms are good. Floating point operation costs become irrelevant when the time to complete a task is only a function of information transfer. Then, the only performance limiters, and therefore targets for optimization, are reading initial states and constant properties, and writing updated states. The performance ceiling is therefore dominated by memory copy speed, see presented the formulation for calculating true performance expectations for memory bound algorithms. Finally, proudly presented are working implementations of acoustic and elastic wave propagation, not idealized and simplified conceptual exercises that perform close to hardware limit. The explicit finite difference acoustic solver achieves 100 GB/s on Fermi M2070 and 180 GB/s on Kepler K10 GPUs, which is 85 % and 75% of memory copy throughput. For the explicit staggered grid elastic solver, which involves 5 times more read-writes, we achieve 60 GB/s on Fermi M2070.

  Back
 
Keywords:
Energy Exploration, Algorithms & Numerical Techniques, GTC 2013 - ID S3176
Streaming:
Download:
 
Muhammed Kabir Hassan (Department of Electrical Engineering, The Pennsylvania State University)
This presentation considers a system containing specially engineered particles called smart proppants, which have the potential to serve as sensors in estimating the effective length of fractures in gas shale formation, a significant factor in p ...Read More

This presentation considers a system containing specially engineered particles called smart proppants, which have the potential to serve as sensors in estimating the effective length of fractures in gas shale formation, a significant factor in predicting the yield of a reservoir. The proppants are suspended and randomly distributed in a background medium, the fracturing fluid.A Monte Carlo method is used to generate the properties and position of smart proppants within a volume that approximates the fracture zone. Modeling the particles as dipoles, one can construct a large and unwieldy matrix equation, simplified by the application of characteristic basis functions (CBF). The CBF method involves application of LU decomposition and SVD based methods to matrix sub blocks, in order to produce subsequent solutions. This presentation will discuss the overall application and solution method, as well as the results of using GPU implementations of the key algorithms, effectively providing 30-40x performance improvement over using a single CPU.

  Back
 
Keywords:
Energy Exploration, Algorithms & Numerical Techniques, GTC 2013 - ID S3200
Streaming:
Download:
 
Abdulrahman Manea (Stanford University), Hamdi Tchelepi (Stanford University)
Designed and implemented a massively parallel version of the Semicoarsening Black Box Multigrid Solver [1], which is capable of handling highly heterogeneous and anisotropic 3D reservoirs, on a parallel architecture with multiple GPU''s. ...Read More

Designed and implemented a massively parallel version of the Semicoarsening Black Box Multigrid Solver [1], which is capable of handling highly heterogeneous and anisotropic 3D reservoirs, on a parallel architecture with multiple GPU''s. For comparison purposes, the same algorithm was also implemented on a shared-memory multi-core parallel architecture using OpenMP. The multi-GPU implementation is found to be always faster than the OpenMP implementation running on 12 Intel (R) Xeon (R) X5650 2.66GHz cores for various highly heterogeneous models derived from the SPE10 Problem.

  Back
 
Keywords:
Energy Exploration, Algorithms & Numerical Techniques, GTC 2013 - ID S3301
Streaming:
Download:
 
Paul Hursky (Heat, Light and Sound Research Inc.)
Learn how GPU computing is being used to address challenges in underwater sound propagation modeling. Modeling sound propagation in the ocean includes calculating complex interference patterns caused by multiple acoustic paths, refracted by a de ...Read More

Learn how GPU computing is being used to address challenges in underwater sound propagation modeling. Modeling sound propagation in the ocean includes calculating complex interference patterns caused by multiple acoustic paths, refracted by a depth-dependent sound speed profile, reflected many times from the ocean boundaries, and over long ranges which make conventional finite-element techniques impractical. As a result, a repertoire of models has emerged that are fully optimized for predominantly shallow angle propagation, in an Nx2D configuration. These models are being expanded to full 3D configurations, to capture phenomena such as propagation through internal waves, bathymetric canyons, platform motion and moving ocean waves. Describing how CUDA was used to accelerate a split-step Fourier Parabolic Equation model, and how this model is being used to create short range "local" and long-range "global" ambient noise "sound scapes" for assessing the impact of man-made noise on the marine mammal environment.

  Back
 
Keywords:
Energy Exploration, Ray Tracing, Signal Processing, GTC 2013 - ID S3339
Streaming:
Download:
 
Cyril Banino-Rokkones (EMGS)
This talk presents the parallelization respectively on cpu and gpu of a 3D Finite-Difference Time-Domain (FDTD) method for the Maxwell equations, a commonly used method in Computational Electromagnetics. EMGS inversion workflows for 3D marine co ...Read More

This talk presents the parallelization respectively on cpu and gpu of a 3D Finite-Difference Time-Domain (FDTD) method for the Maxwell equations, a commonly used method in Computational Electromagnetics. EMGS inversion workflows for 3D marine controlled-source electromagnetic data (CSEM) generate a large volume of FDTD modeling jobs (> 20.000/hour), which makes FDTD the most compute-intensive application running on EMGS HPC facilities. EMGS aims to continuously assess the potential of new technology for (i) reducing application run times, (ii) increasing global job throughput, and (iii) reducing the TCO of its HPC facilities. The first step when assessing the GPGPU technology for EMGS business is reported in this talk. In collaboration with NVIDIA, a comparison study of directives based approaches (OpenMP on cpu) and (OpenACC on gpu), will be carried out for two applications: a benchmark, Yee-bench, developed at the Center for Parallel Computers (PDC) in Stockholm, Sweden, which implements the core of the method, an industrial CSEM application developed at EMGS. Profiling and timings on recent architectures will be reported.

  Back
 
Keywords:
Energy Exploration, Computational Physics, GTC 2013 - ID S3340
Streaming:
Download:
 
Joseph Winston (Halliburton/Landmark Graphics)
IndeX from NVIDIA ARC provides a CUDA-enabled toolkit for volume rendering. IndeX distributes the volumetric data across the gpus available on a system, in parallel, renders the subsets using a raycasting technique, and composites the results fo ...Read More

IndeX from NVIDIA ARC provides a CUDA-enabled toolkit for volume rendering. IndeX distributes the volumetric data across the gpus available on a system, in parallel, renders the subsets using a raycasting technique, and composites the results for the final image. With proper setup, the software raycasted result can be blended with the rendering of a traditional OpenGL-based application. This session will present the steps necessary to utilize IndeX with a scenegraph, OpenGL-based application.

  Back
 
Keywords:
Energy Exploration, Large Scale Data Visualization & In-Situ Graphics, Scientific Visualization, GTC 2013 - ID S3347
Streaming:
Download:
 
Marc Nienhaus (NVIDIA), Stefan Radig (NVIDIA), Joerg Mensmann (NVIDIA)
NVIDIA indeX is a GPU cluster-based software solution that enables scalable real-time visualization of large-scale data and is used in the Oil & Gas industries for seismic data interpretation. Here, the visualized large-scale stacked seismic ...Read More

NVIDIA indeX is a GPU cluster-based software solution that enables scalable real-time visualization of large-scale data and is used in the Oil & Gas industries for seismic data interpretation. Here, the visualized large-scale stacked seismic data result from pre-processing multi-dimensional or multi-valued raw data, which is multiple times larger. For instance, pre-processing approx. 54 terabyte of pre-stacked data typically generates the seismic attributes of just a 90 gigabyte stacked dataset for visualization. Parallel and distributed computing algorithms are commonly used to process the multi-dimensional or multi-values data on compute clusters. The NVIDIA indeX visualization framework enables the seamless integration of user-defined parallel and distributed compute algorithms to generate seismic attributes. NVIDIA indeX triggers the external compute algorithms and populate the scalable large-scale data rendering algorithm with the resulting seismic attributes for immediate real-time display.

  Back
 
Keywords:
Energy Exploration, Manufacturing Technical, GTC 2013 - ID S3415
Streaming:
Download:
 
Stefan Radig (NVIDIA)
We present a software solution for simplifying the development of cluster-based software. It is used as the underlying technology for scalable software like NVIDIA indeX, a GPU-based solution for large volume visualization, and NVIDIA Iray, a GP ...Read More

We present a software solution for simplifying the development of cluster-based software. It is used as the underlying technology for scalable software like NVIDIA indeX, a GPU-based solution for large volume visualization, and NVIDIA Iray, a GPU-based path tracer. Our solution combines a distributed, in-memory NoSQL data store for arbitrary data, a job distribution and scheduling system, and a high-speed networking layer to provide a high level abstraction for writing software which scales efficiently to hundreds of machines. Compared to other solutions, it allows application writers to concentrate on their field of expertise without having to focus on low-level networking and parallelization details. Our solution offers a unique combination of features, like fail-safety, dynamic clustering, and special support for CUDA-based applications. It is specifically suitable for developing interactive GPU based applications.

  Back
 
Keywords:
Energy Exploration, GTC 2013 - ID S3576
Streaming:
Download:
Finance
Presentation
Media
Cris Doloc (Chicago Trading Company)
Get insight into how a leading Market Maker in listed derivatives is leveraging the GPU technology to drive innovation, increase performance and efficiency, while decreasing operational costs. The goal of this session is to offer an interesting ...Read More

Get insight into how a leading Market Maker in listed derivatives is leveraging the GPU technology to drive innovation, increase performance and efficiency, while decreasing operational costs. The goal of this session is to offer an interesting view into the world of Options Market Makers as it relates to the technology infrastructure used to provide real-time pricing of firm wide risk. The presentation will also convey comprehensive implementation details for two customized numerical techniques used in real-time risk pricing: the Leisen-Reimer with Richardson interpolation and the Adaptive Mesh methodology. Due to the massive parallelization potential that GPU is offering, the accuracy of the real-time pricing methodology has been improved by orders of magnitude while the calculation speed has increased tenfold.

  Back
 
Keywords:
Finance, GTC 2013 - ID S3173
Streaming:
Download:
 
Partha Sen (Fuzzy Logix LLC)
Fixed Income instruments like Bonds, Interest Rate Options, Credit Derivatives, etc. are associated with a rich set of analytics. Traditionally, portfolio managers and quantitative traders have relied heavily on fixed income analytics as a means ...Read More

Fixed Income instruments like Bonds, Interest Rate Options, Credit Derivatives, etc. are associated with a rich set of analytics. Traditionally, portfolio managers and quantitative traders have relied heavily on fixed income analytics as a means for generating Alpha or Excess Returns. The challenge has always been performing complex fixed income analytics on a large volume of instruments fast enough so that traders can take advantage of arbitrage opportunities that may exist from time to time. Learn how GPU can unlock the doors for some new frontiers in the fixed income markets with their amazing compute capabilities.

  Back
 
Keywords:
Finance, GTC 2013 - ID S3315
Streaming:
Download:
 
Andrew Sheppard (Fountainhead)
Credit Valuation Adjustment (CVA) is hard. Indeed, running CVA is a challenge regardless of the computer hardware available. It is both computationally demanding and also a data challenge; and in the case of enterprise-wide CVA a "Big Data& ...Read More

Credit Valuation Adjustment (CVA) is hard. Indeed, running CVA is a challenge regardless of the computer hardware available. It is both computationally demanding and also a data challenge; and in the case of enterprise-wide CVA a "Big Data" challenge. GPUs can help by accelerating certain aspects of CVA. If you are implementing CVA, this talk shows where and how GPUs can help.

  Back
 
Keywords:
Finance, Algorithms & Numerical Techniques, Databases, Data Mining, Business Intelligence, GTC 2013 - ID S3368
Streaming:
Download:
 
Matthew Leslie (Bank Of America Merrill Lynch)
Learn how domain specific languages provide a convenient way for domain experts to describe financial payoffs. These descriptions can then be used to provide valuation and risk management information. Describing the challenges involved in execut ...Read More

Learn how domain specific languages provide a convenient way for domain experts to describe financial payoffs. These descriptions can then be used to provide valuation and risk management information. Describing the challenges involved in executing such languages on GPUs, and outline techniques which can be used to overcome these challenges.

  Back
 
Keywords:
Finance, Parallel Programming Languages & Compilers, GTC 2013 - ID S3369
Streaming:
Download:
 
Gerald Hanweck (Hanweck Associates, LLC)
The computational demands of quantitative finance grow insatiably each year, driven by increasingly complex products, greater regulatory pressure and more competitive markets. Dr. Hanweck will discuss Hanweck Associates'' experience usin ...Read More

The computational demands of quantitative finance grow insatiably each year, driven by increasingly complex products, greater regulatory pressure and more competitive markets. Dr. Hanweck will discuss Hanweck Associates'' experience using GPUs to accelerate -- by orders of magnitude -- the pricing and risk calculations for listed options, interest-rate swaps, exotic derivatives and other asset classes. The session will examine applications of GPUs to the quantitative financial methods often employed in derivatives pricing and risk, including Monte Carlo simulations, trees and lattices, Fast-Fourier Transforms and matrix algebra.

  Back
 
Keywords:
Finance, Algorithms & Numerical Techniques, GTC 2013 - ID S3373
Streaming:
Download:
 
Dominique Delarue (BNP Paribas), Azim Siddiqi (BNP Paribas)
Since the high-profile defaults of 2008 counterparty credit risk and in particular the Credit Valuation Adjustment (CVA) have moved sharply into focus and risk management systems have been asked to perform many more calculations. Compounded by n ...Read More

Since the high-profile defaults of 2008 counterparty credit risk and in particular the Credit Valuation Adjustment (CVA) have moved sharply into focus and risk management systems have been asked to perform many more calculations. Compounded by new requirements from regulators, banks need to find orders of magnitude improvements in their computational horsepower. In this work, we present an innovative system designed to support these massive calculations in a matter of minutes rather than hours on hybrid GPU/CPU-based platform, featuring NVIDIA GPUs with managed languages and frameworks (Java, .NET), automated deployment, testing and delivery tools to allow fast, confident and robust delivery into a live environment. The highly scalable solution supports the running of hundreds of sensitivity calculations and achieves near real-time user simulations while preserving the total cost of ownership.

  Back
 
Keywords:
Finance, GTC 2013 - ID S3374
Streaming:
Download:
 
Pierrre Spatz (www.murex.com)
With a focus on programming for financial derivatives, this talk will first present the latest extensions of GPU acceleration analytics and updated benchmarks using the latest K20X. In a second stage of this talk , the preperation of a library t ...Read More

With a focus on programming for financial derivatives, this talk will first present the latest extensions of GPU acceleration analytics and updated benchmarks using the latest K20X. In a second stage of this talk , the preperation of a library to take advantage of future generations of GPU through a practical experience with the CARMA Dev Kit featuring both an ARM CPU and an NVIDIA GPU will be presented.

  Back
 
Keywords:
Finance, GTC 2013 - ID S3387
Streaming:
Download:
 
Ettikan Kandasamy Karuppiah (MIMOS BHD), Saw Meng Soo (MIMOS BHD)
In financial mathematics and financial risk management, Value at Risk (VaR) is a widely used risk measure of the risk of loss on a specific portfolio of financial assets. In this session, we will share our experience in computing VaR for interes ...Read More

In financial mathematics and financial risk management, Value at Risk (VaR) is a widely used risk measure of the risk of loss on a specific portfolio of financial assets. In this session, we will share our experience in computing VaR for interest rate portfolio using GPU. Previous implementation of this system was done using SQL. We ported this application into GPU to enhance computation speed, produce time sensitive results, and handle complex configurations consisting huge amounts of data requiring hours or even days to compute using SQL.

  Back
 
Keywords:
Finance, Algorithms & Numerical Techniques, GTC 2013 - ID S3389
Streaming:
Download:
 
Aamir Mohammad (Aon Benfield Securities)
The simulation of hedging strategies for life insurance products with embedded financial guarantees plays an increasingly important role in the financial reporting and hedging activities inside of life insurance companies. Such simulations marke ...Read More

The simulation of hedging strategies for life insurance products with embedded financial guarantees plays an increasingly important role in the financial reporting and hedging activities inside of life insurance companies. Such simulations market data, as well as sophisticated, market-consistent scenario generators, combined with complex re-balancing algorithms and hedging instruments. In practice, these important simulations are done using heuristics or avoided entirely due to high computational demands and programming complexity. We present an overview of such simulations and technical methods for addressing the key challenges related to this important risk management and financial reporting activity. We will examine hedging Variable Annuity risk with Futures and Interest Rate Swaps as an example and describe the general computing methods used for hedging simulations. We will describe the stochastic-on-stochastic (double nested) simulation pattern used for hedging simulations and implementation of Domain Specific Language (DSL) and elastic GPU cloud solutions for improving performance, productivity and cost efficiency of such activities.

  Back
 
Keywords:
Finance, Parallel Programming Languages & Compilers, GTC 2013 - ID S3482
Streaming:
Download:
Game Development
Presentation
Media
Richard Tonge (NVIDIA)
The algorithms and implementation of the NVIDIA PhysX GPU Rigid Body simulator will be presented. Rigid body dynamics is widely used in applications ranging from movies to engineering to video games. Some of the most visually interesting simulat ...Read More

The algorithms and implementation of the NVIDIA PhysX GPU Rigid Body simulator will be presented. Rigid body dynamics is widely used in applications ranging from movies to engineering to video games. Some of the most visually interesting simulations involve destruction, such as projectile impacts and explosions, and these can generate large piles of debris. Piles require stable simulation of static friction and resting contact, which presents many challenges. In real time simulations, such as video games, the computation budget can be small compared to the number of rigid body contacts. In these cases we must use iterative methods, terminating the iteration before convergence. By stopping early we introduce residual energy into the system, which can cause objects near rest to jitter. Parallelism is essential for good GPU performance, and we describe our algorithm which eliminates jitter at low iteration counts and maximizes parallelism.

  Back
 
Keywords:
Game Development, Visual Effects & Simulation, GTC 2013 - ID S3388
Streaming:
Download:
 
Dane Johnston (NVIDIA), Jim Sanders (Gearbox Software)
Borderlands 2 released to critical acclaim on September 18th, 2012 featuring a wide variety of CUDA accelerated GPU effects. This session will dive into the exact effects that were featured and their benefits to the visual fidelity of the gamer ...Read More

Borderlands 2 released to critical acclaim on September 18th, 2012 featuring a wide variety of CUDA accelerated GPU effects. This session will dive into the exact effects that were featured and their benefits to the visual fidelity of the gamer experience. Side by side examples of the changing landscape of the game environment by the addition of GPU accelerated effects will be used to examine how these effects can change the intensity and quality of a gaming session. Further examinations into the production process and effects creation will also be explored.

  Back
 
Keywords:
Game Development, GTC 2013 - ID S3399
Streaming:
Download:
 
John McDonald (NVIDIA), Rich Geldreich (Valve Software), Mike Sartain (Valve Software)
Obtain an in-depth guide to porting games to Linux. A primarily technical discussion, Rich and John will discuss tool alternatives on Linux, OS issues and pitfalls, and porting from Direct3D to OpenGL. ...Read More

Obtain an in-depth guide to porting games to Linux. A primarily technical discussion, Rich and John will discuss tool alternatives on Linux, OS issues and pitfalls, and porting from Direct3D to OpenGL.

  Back
 
Keywords:
Game Development, GTC 2013 - ID S3418
Streaming:
Download:
 
Evan Hart (NVIDIA)
OpenGL has changed rapidly with five releases in less than three years. This talk will discuss how the new improvements such as debug support, tessellation, and enhanced object-oriented support can improve your application. Additionally, this ta ...Read More

OpenGL has changed rapidly with five releases in less than three years. This talk will discuss how the new improvements such as debug support, tessellation, and enhanced object-oriented support can improve your application. Additionally, this talk will cover what NVIDIA''s latest features of path rendering and bindless graphics can provide.

  Back
 
Keywords:
Game Development, Media & Entertainment, Real-Time Graphics Applications, GTC 2013 - ID S3420
Streaming:
Download:
 
James Dolan (NVIDIA)
Mobile devices are becoming increasingly powerful and are quickly catching up to what the current generation of PCs and video game consoles are capable of. As a result, it is starting to become possible to bring the same content that recently wa ...Read More

Mobile devices are becoming increasingly powerful and are quickly catching up to what the current generation of PCs and video game consoles are capable of. As a result, it is starting to become possible to bring the same content that recently was only possible on those platforms to true mobile devices. In this session we will use a next-generation Tegra device running Android with real world applications as examples and discuss everything from taking advantage of existing build systems to porting to optimizing CPU and GPU code.

  Back
 
Keywords:
Game Development, Graphics Performance Optimization, Mobile Applications & Interfaces, Real-Time Graphics Applications, GTC 2013 - ID S3422
Streaming:
Download:
 
Jon Jansen (NVIDIA)
NVIDIA''s Devtech engineers are constantly developing new technologies to improve the visuals of current computer games. All these new cutting edge algorithms are provided to developers in the form of libraries in order to facilitate the ...Read More

NVIDIA''s Devtech engineers are constantly developing new technologies to improve the visuals of current computer games. All these new cutting edge algorithms are provided to developers in the form of libraries in order to facilitate the integration and tuning in any given game engine, as well as to guarantee the most optimal performance on a variety of hardware configurations. In this session we will provide an overview of some of these libraries, the roadmap and some case studies on how they were successfully used in shipping games.

  Back
 
Keywords:
Game Development, GTC 2013 - ID S3480
Streaming:
Download:
General Interest
Presentation
Media
Jen-Hsun Huang (NVIDIA)
Don''t miss the opening keynote feature Jen-Hsun Huang, Co-Founder, President, and CEO of NVIDIA. Hear about what''s next in computing and graphics, and preview disruptive technologies and exciting demonstrations across industrie ...Read More

Don''t miss the opening keynote feature Jen-Hsun Huang, Co-Founder, President, and CEO of NVIDIA. Hear about what''s next in computing and graphics, and preview disruptive technologies and exciting demonstrations across industries.

  Back
 
Keywords:
General Interest, GTC 2013 - ID S3900
Streaming:
Download:
 
Erez Lieberman Aiden (Department of Genetics at Baylor College of Medicine; Department of Computer Science of Computational and Applied Mathematics at Rice University)
The human genome is a sequence of 3 billion chemical letters inscribed in a molecule called DNA. Famously, short stretches (~10 letters, or a-base pairs) of DNA fold into a double helix. But what about longer pieces? How does a 2 meter long macr ...Read More

The human genome is a sequence of 3 billion chemical letters inscribed in a molecule called DNA. Famously, short stretches (~10 letters, or a-base pairs) of DNA fold into a double helix. But what about longer pieces? How does a 2 meter long macromolecule, the genome, fold up inside a 6 micrometer wide nucleus? And, once packed, how does the information contained in this ultra-dense structure remain accessible to the cell? This talk will discuss how the human genome folds in three dimensions, a folding enables the cell to access and process massive quantities of information in parallel. To probe how genomes fold, we developed Hi-C, together with collaborators at the Broad Institute and UMass Medical School. Hi-C couples proximity-dependent DNA ligation and massively parallel sequencing. To analyze our data and reconstruct the underlying folds, we, too must engage in massively parallel computation. I will describe how we use NVIDIA's CUDA technology to analyze our results and simulate the physical processes of genome folding and unfolding.

  Back
 
Keywords:
General Interest, Algorithms & Numerical Techniques, Bioinformatics & Genomics, GTC 2013 - ID S3901
Streaming:
 
Ralph Gilles (Chrysler Group LLC)
Ralph Gilles, senior vice president of Product Design and president and CEO of SRT (Street and Racing Technology) Brand and Motorsports at Chrysler Group LLC and the mind behind some of the company most innovative products, will provide a behind ...Read More

Ralph Gilles, senior vice president of Product Design and president and CEO of SRT (Street and Racing Technology) Brand and Motorsports at Chrysler Group LLC and the mind behind some of the company most innovative products, will provide a behind-the-scenes look at the auto industry. Gilles will review how GPUs are used to advance every step of the automobile development process from the initial conceptual designs and engineering phases through product assembly and marketing. He will also discuss and how Chrysler Group utilizes GPUs and the latest technologies to build better, safer cars and reduce time to market.

  Back
 
Keywords:
General Interest, Automotive, GTC 2013 - ID S3902
Streaming:
Graphics Performance Optimization
Presentation
Media
Markus Tavenrath (NVIDIA), Christoph Kubisch (NVIDIA)
The goal of this session is to demonstrate techniques that improve GPU scalability when rendering complex scenes. This is achieved through a modular design that separates the scene graph representation from the rendering backend. We will explain ...Read More

The goal of this session is to demonstrate techniques that improve GPU scalability when rendering complex scenes. This is achieved through a modular design that separates the scene graph representation from the rendering backend. We will explain how the modules in this pipeline are designed and give insights to implementation details, which leverage GPU''s compute capabilities for scene graph processing. Our modules cover topics such as shader generation for improved parameter management, synchronizing updates between scenegraph and rendering backend, as well as efficient data structures inside the renderer.

  Back
 
Keywords:
Graphics Performance Optimization, Computer Aided Design, GTC 2013 - ID S3032
Streaming:
Download:
 
Veenus Vasu (NeST/SFO Technologies)
This session is intended to showcase the power of graphics programming (HSL, GLSL) and GPU computing (mainly CUDA) to a programming audience that wishes to develop graphics software in near feature CAE. NeST''s experience on GPU based pr ...Read More

This session is intended to showcase the power of graphics programming (HSL, GLSL) and GPU computing (mainly CUDA) to a programming audience that wishes to develop graphics software in near feature CAE. NeST''s experience on GPU based programming in CAE will be shared.

  Back
 
Keywords:
Graphics Performance Optimization, Large Scale Data Visualization & In-Situ Graphics, Manufacturing Technical, GTC 2013 - ID S3271
Streaming:
Download:
 
Louis Gaiot (HP), Ray Gilmartin (HP), Sean Young (HP)
HP Workstations and Nvidia GPUs are critical elements to help customers in Manufacturing push the limits on product development and help customers in Media & Entertainment create compelling content in less time. HP will share first-hand exam ...Read More

HP Workstations and Nvidia GPUs are critical elements to help customers in Manufacturing push the limits on product development and help customers in Media & Entertainment create compelling content in less time. HP will share first-hand examples of customer stories and benchmarking, ideal configurations, and performance comparisons that highlight the benefits of GPU computing on HP Workstations.

  Back
 
Keywords:
Graphics Performance Optimization, GTC 2013 - ID S3515
Streaming:
Download:
Instrument Clusters & Heads-Up Display (HUD)
Presentation
Media
Justin Ebert (NVIDIA)
Developed by NVIDIA, the UI Composer Studio is the ground-breaking HMI design tool used for instrument clusters and infotainment systems. The UI Composer Studio is used by automakers and Tier 1 automotive suppliers to rapidly develop proof of co ...Read More

Developed by NVIDIA, the UI Composer Studio is the ground-breaking HMI design tool used for instrument clusters and infotainment systems. The UI Composer Studio is used by automakers and Tier 1 automotive suppliers to rapidly develop proof of concepts for evaluation, market research, usability testing and ultimately final production. This tutorial covers Studio''s user interface, project structure, animation techniques, LUA scripting and previewing. In addition, the basics of constructing an instrument cluster and IVI using Studio''s advanced authoring environment will be discussed.

  Back
 
Keywords:
Instrument Clusters & Heads-Up Display (HUD), Automotive, In-Vehicle Infotainment (IVI), GTC 2013 - ID S3203
Streaming:
Download:
Large Scale Data Visualization & In-Situ Graphics
Presentation
Media
Greg Scantlen (Creative Consultants LLC)
This session suggests the coming of a "Virtual Age", the successor to the current Information Age. Through the use of GPU technology, in an interactive compute and visualization environment, science and engineering expands the capabili ...Read More

This session suggests the coming of a "Virtual Age", the successor to the current Information Age. Through the use of GPU technology, in an interactive compute and visualization environment, science and engineering expands the capability to analyze information in a personal virtual workspace. Recent advances in GPU technology have made possible an interactive analytic capability of large data sets in a high fidelity stereoscopic visual environment. A specialized communications layer links the compute and visualization subsystems for greater scalability. This Virtual Age work space compliments the HPC Supercomputing Center, and Visualization Laboratory for accessible, real time, "what if" analysis of large data sets using Massively Parallel Computing and Stereoscopic Immersive Multi-Display surfaces.

  Back
 
Keywords:
Large Scale Data Visualization & In-Situ Graphics, Scientific Visualization, GTC 2013 - ID S3110
Streaming:
Download:
 
Kenji Kato (NASA Ames, Dell Federal)
Have you ever thought about the visual sharpness, or acuity of your display system? Well that is the aim of the Operational Based Vision Assessment (OBVA) simulator designed and built by NASA Ames, and the United States Air Force for the Air For ...Read More

Have you ever thought about the visual sharpness, or acuity of your display system? Well that is the aim of the Operational Based Vision Assessment (OBVA) simulator designed and built by NASA Ames, and the United States Air Force for the Air Force School of Aerospace Medicine (USAFSAM). Our goal was to create a scientific testing laboratory to study eye limiting, human vision, and testing standards in an operationally relevant environment. This talk will cover the general design objectives and implementation characteristics of the simulators visual system, the challenges associated with delivering a real-time, man-in-the-loop, eye limiting, visual simulator, and why more than just eye limiting resolution matters.

  Back
 
Keywords:
Large Scale Data Visualization & In-Situ Graphics, Collaborative & Large Resolution Displays, Combined Simulation & Real-Time Visualization, Manufacturing General, GTC 2013 - ID S3128
Streaming:
Download:
 
Vijay Kalivarapu (Virtual Reality Applications Center, Iowa State University)
Experts describe the set up and maintenance of large-scale CGI clusters and how they can run in harmony with high-resolution immersive stereo display projection systems. Having to accommodate developers and users while balancing behind the scene ...Read More

Experts describe the set up and maintenance of large-scale CGI clusters and how they can run in harmony with high-resolution immersive stereo display projection systems. Having to accommodate developers and users while balancing behind the scenes with IT management presents unique challenges, particularly when using a cluster system like 48-node 96-GPU cluster powering 24 4K resolution projectors. This talk will cover some of the challenges that are encountered on a daily basis and good practices that have been learned over years to improve the experience for developers building visualization applications and stakeholders using the system.

  Back
 
Keywords:
Large Scale Data Visualization & In-Situ Graphics, Manufacturing Technical, Real-Time Graphics Applications, GTC 2013 - ID S3165
Streaming:
Download:
 
Yong Cao (Virginia Tech)
This session aims to provide in-depth analysis of massive model rendering techniques on many-core parallel architecture and provides several solutions to the major challenges in this research direction. We will focus on the following two issues: ...Read More

This session aims to provide in-depth analysis of massive model rendering techniques on many-core parallel architecture and provides several solutions to the major challenges in this research direction. We will focus on the following two issues: 1) Parallel Level of Detail (LOD) processing; 2) Efficient data streaming and management between host memory and device memory. In the session, we will introduce all stages of massive model rendering and describe the parallel processing strategies for each stage of this pipeline. The performance analysis of the proposed solutions will be provided for discussion.

  Back
 
Keywords:
Large Scale Data Visualization & In-Situ Graphics, Scientific Visualization, GTC 2013 - ID S3253
Streaming:
Download:
 
Danny Holten (SynerScope B.V.)
Explore the emerging Big Data landscape and how GPUs can unlock this data through fast, end-user-accessible visual analysis technologies. Never before has there been such a demand for detailed insight in and non-aggregated analysis of large data ...Read More

Explore the emerging Big Data landscape and how GPUs can unlock this data through fast, end-user-accessible visual analysis technologies. Never before has there been such a demand for detailed insight in and non-aggregated analysis of large data from financial, cybersecurity, and logistic domains. SynerScope is leveraging GPUs to provide Big Data insights to end users and domain experts and bridge the gap between large analytical data analysis and visual data interpretation. This presentaton will show you how by using highly scalable, real-time interactive visualizations to provide unprecedented end-user insights.

  Back
 
Keywords:
Large Scale Data Visualization & In-Situ Graphics, Databases, Data Mining, Business Intelligence, Finance, GTC 2013 - ID S3261
Streaming:
Download:
 
Yong Cao (Virginia Tech)
Learn to manage the load balancing problems in a multiple GPU computing environment. The session uses an in-situ simulation/visualization application to demonstrate an adaptive load-balancing approach that can efficiently utilize the multi-GPU r ...Read More

Learn to manage the load balancing problems in a multiple GPU computing environment. The session uses an in-situ simulation/visualization application to demonstrate an adaptive load-balancing approach that can efficiently utilize the multi-GPU resources. In this session, we will describe two methods: 1) a data-driven approach that learns from the performance analysis data of the past executions; 2) a dynamic load balancing method based on buffer fullness that can adjust to workload changes at runtime. We will also introduce a software framework for load balancing to account for differing characteristics of in-situ visualization applications. Implementation of a multi-GPU data structure allows for use of these load balancing methods in the framework.

  Back
 
Keywords:
Large Scale Data Visualization & In-Situ Graphics, Manufacturing Technical, Supercomputing, GTC 2013 - ID S3268
Streaming:
Download:
Manufacturing General
Presentation
Media
Fernando Toledo (Gulfstream Aerospace)
The goal of this session is to show how NVIDIA and commercial-of-the-shelf real-time visual simulation software is changing the approach to simulate the cabin interior and exterior of corporate aircrafts in an interactive, accurate and photo-rea ...Read More

The goal of this session is to show how NVIDIA and commercial-of-the-shelf real-time visual simulation software is changing the approach to simulate the cabin interior and exterior of corporate aircrafts in an interactive, accurate and photo-realist way. By using the latest development in OpenGL and in real-time ray-tracing technology interactive floor plan layouts can be simulated by several types of systems, from desktops and laptops to virtual reality rooms, from local computers to remote clusters, from tablets to smart-phones through web access. We will present how we are extracting the useful geometry data from CAD/CAID (DS CATIA V5/AliasStudio) and developing compelling, behavior based simulation in RTT DeltaGen Suite. We will challenge the NVIDIA community by sharing where technical improvement could be highly beneficial for the aviation sector. At the end we will show a live simulation of one our flagship aircraft.

  Back
 
Keywords:
Manufacturing General, Collaborative & Large Resolution Displays, Digital Product Design & Styling, Large Scale Data Visualization & In-Situ Graphics, GTC 2013 - ID S3348
Streaming:
Download:
 
David Kasik (Boeing), Christopher J Senesac (Boeing)
As a company involved in computer graphics since 1960, Boeing has had substantial influence on the fields of computer graphics, interactive techniques, and computational geometry. Examples include non-uniform rational b-spline geometry, user int ...Read More

As a company involved in computer graphics since 1960, Boeing has had substantial influence on the fields of computer graphics, interactive techniques, and computational geometry. Examples include non-uniform rational b-spline geometry, user interface management systems, and augmented reality. Like many companies, Boeing is making extensive use of visualization and computer graphics as an integral part of the business throughout design, engineering, manufacturing, and support. The session will cover a number of current use cases. What sets Boeing apart as a case study is the scale of the problems addressed. Scale includes number of people working on a program, the amount of data that must be visualized, the global reach of projects, and the amount of time product data must be retained. The scale problems are driving continued innovation in fundamental visualization and computer graphics technology and in the ways in which the technology is applied. Future technology advances are needed to let Boeing continue to improve our products and processes. This session will examine past, present, and future of computer graphics and visualization through a large aerospace company lens.

  Back
 
Keywords:
Manufacturing General, GTC 2013 - ID S3440
Streaming:
Download:
 
Peter Fassbender (Fiat Latin America)
Get to know how the Design Center Fiat in Brazil uses high resolution images to communicate their creations to their inside and outside clients. From design approval to communication and marketing virtual images and animations are fundamental in ...Read More

Get to know how the Design Center Fiat in Brazil uses high resolution images to communicate their creations to their inside and outside clients. From design approval to communication and marketing virtual images and animations are fundamental in the Design process to sell the creation. Peter Fassbender will show how the designers use the virtual world for the development of future cars and components. The Fiat Design Center Latam is owner of one of the most advanced virtual reality room in South Amarica. Peter will use the specially the Fiat Mio project as an example. The Fiat Mio was world''s first 100% open source created car from an internet community by an OEM and was shown on the international Car Show 2010 in São Paulo.

  Back
 
Keywords:
Manufacturing General, Automotive, Digital Product Design & Styling, GTC 2013 - ID S3444
Streaming:
Download:
 
Erik Beaumont (Ventuz)
Today''s view of what the future should look like is in large part driven by Hollywood. Our visual expectations have grown with the movie industry. Yet today, still, most presentations, product launches and show rooms are using methods a ...Read More

Today''s view of what the future should look like is in large part driven by Hollywood. Our visual expectations have grown with the movie industry. Yet today, still, most presentations, product launches and show rooms are using methods and technology that are still essentially the same as before the movie industry was around. For almost 10 years, Ventuz has been pushing the boundaries of how we can use technology to create presentations, events or immersive environments. Immersive interactive rooms, giant touch screens, interactive presentations on glass surfaces, gestural recognition, holographic projections, augmented reality - these are not just technology buzzwords or harebrained Sci-Fi ideas - they have all been put into reality and used by manufacturers to launch products, build showrooms or drive trade show displays. Showing a product is really about telling a story, whether in a show room, at a trade show or a launch event. This talk is aimed at looking at how technology can aid you in telling that story, looking at some of the distractions, misconceptions and pitfalls, showcasing some interesting concepts and examples you may not be aware of yet, and perhaps taking a brave venture into what the near future might bring.

  Back
 
Keywords:
Manufacturing General, Architectural Mapping & Event Visualization, Real-Time Graphics Applications, GTC 2013 - ID S3459
Streaming:
 
Andrew Guldman (Fluid, Inc.)
Fluid Configure is the leading SaaS solution for online product configuration. Customers like Wild Things Gear and Serena & Lily have been successfully using Fluid Configure to sell configurable products like apparel, footwear, bedding, and ...Read More

Fluid Configure is the leading SaaS solution for online product configuration. Customers like Wild Things Gear and Serena & Lily have been successfully using Fluid Configure to sell configurable products like apparel, footwear, bedding, and more. Using the RealityServer platform from migenius and the iray 3d rendering engine running on NVIDIA GPUs Fluid Configure programmatically pre-renders thousands of product component images from 3d models. These images are dynamically manipulated at runtime to present interactive photorealistic product imagery to the consumer during the configuration process. NVIDIA GPUs and iray make it possible to render these high quality images at the required volume. Prior to the Fluid Configure solution backed by NVIDIA GPUs, the creation of imagery for online product configurators required more manual effort and expense, and afforded business users less direct control over the process. The Fluid Configure solution features admin tools that allow business users to control the configurability of products; a scalable, cost effective, and high quality image production pipeline; and smooth integration with ecommerce and manufacturing systems. In this session, we will look at Fluid Configure in action, and share some of the customer experiences and lessons learned.

  Back
 
Keywords:
Manufacturing General, Graphics Performance Optimization, Remote Graphics & Cloud-Based Graphics, GTC 2013 - ID S3460
Streaming:
Download:
 
Dave Farnia (JSOL Corp.)
This presentation introduces our acceleration of electromagnetic field analysis by use of a GPU. In electromagnetic field FEA, the numerical analysis method generally used is an iterative method, and by adapting this iterative method, we have ma ...Read More

This presentation introduces our acceleration of electromagnetic field analysis by use of a GPU. In electromagnetic field FEA, the numerical analysis method generally used is an iterative method, and by adapting this iterative method, we have managed to realize satisfactory performance with many-core calculation using a GPU. The presentation includes a comparison of calculation speed between a CPU and the Tesla GPU when our proposed method is applied to electromagnetic field analysis in electrical equipment such as motors and transformers.

  Back
 
Keywords:
Manufacturing General, Development Tools & Libraries, GTC 2013 - ID S3464
Streaming:
Download:
 
Brendan Hickey (Applied Materials)
Case study of implementing CAD in the cloud for 3D mechanical design using multiple CAD tools and a global PLM infrastructure at Applied Materials. Applied Materials has 2000+ engineers spread across 93 locations in 22 countries. Each engineer h ...Read More

Case study of implementing CAD in the cloud for 3D mechanical design using multiple CAD tools and a global PLM infrastructure at Applied Materials. Applied Materials has 2000+ engineers spread across 93 locations in 22 countries. Each engineer had to have a powerful CAD workstation connected locally to terabytes of engineering data, which limited the engineer''s mobility and also poised a significant maintenance and support issue for IT. Applied Materials has addressed these issues by implementing a Desktop Cloud Solution where engineers can use laptops connected to 3 synchronized and consolidated continental Data centers to run their CAD applications, giving them high performance on a mobile platform. Audience members will learn about the details for building out this cloud infrastructure, the challenges involved, and the benefits that Applied Materials is currently seeing. From this real world example, audience members will gain key insights for determining if this type of solution is right for their company.

  Back
 
Keywords:
Manufacturing General, Desktop & Application Virtualization, Remote Graphics & Cloud-Based Graphics, GTC 2013 - ID S3472
Streaming:
Download:
 
Matthew Gueller (Harley-Davidson Motor Company)
Harley-Davidson has been designing and manufacturing motorcycles for over 110 years. While the motorcycles designs remain true to the heritage, the process has evolved to incorporate many new tools into the conceptual design process to reduce th ...Read More

Harley-Davidson has been designing and manufacturing motorcycles for over 110 years. While the motorcycles designs remain true to the heritage, the process has evolved to incorporate many new tools into the conceptual design process to reduce the time required to develop new products, improve styling intent and to allow for greater conceptual exploration. By leveraging tools from Bunkspeed, Keyshot, Autodesk, Daussalt and others, we have added flexibility to our process for delivering high quality designs earlier. This presentation will go thru some of the conceptual design workflows and show how Harley-Davidson uses visualization tools to bring it all together. Feedback on GPU vs CPU performance benchmarking done at Harley-Davidson and how these tools are leveraged will be provided.

  Back
 
Keywords:
Manufacturing General, GTC 2013 - ID S3508
Streaming:
Download:
 
Galen Faidley (Caterpillar Inc.)
Immersive Visualization has become an integral part of Caterpillar''s global product development process. The technology allows for human-machine interaction to be evaluated prior to physical prototypes being built and provides engineers ...Read More

Immersive Visualization has become an integral part of Caterpillar''s global product development process. The technology allows for human-machine interaction to be evaluated prior to physical prototypes being built and provides engineers with greater insight into the products they are designing. This talk will provide an overview of immersive visualization, will discuss how immersive visualization can aid product development, and will cover some lessons learned from Caterpillar''s global deployment of the technology.

  Back
 
Keywords:
Manufacturing General, GTC 2013 - ID S3512
Streaming:
Download:
 
Kimon Onuma (Onuma, Inc.)
This session will demonstrate the first real-time connection between live sensor data, web-based Building Information Models and Geographic Information Systems. BIMStorm is a Cloud Computing collaborative process that provides: (i) better tools ...Read More

This session will demonstrate the first real-time connection between live sensor data, web-based Building Information Models and Geographic Information Systems. BIMStorm is a Cloud Computing collaborative process that provides: (i) better tools to visualize information and connected data; (ii) the ability to visualize systems in the building; (iii) downstream people who want to visualize/understand it; (iv) connects teams of participants in real time decision making;(v) connects to other technologies CAD/DCC (Max/Maya, etc)/Unity3D; provide real to design comparisons and point cloud capturing existing conditions as well as real-time design decisions and collaboration. Join us to learn how Onuma''s dedication to improving productivity and reducing waste in the building and energy industries has resulted in a radical shift in how architects, planners and facility owners, shape the environment using new processes and technology.

  Back
 
Keywords:
Manufacturing General, Cloud Visualization, Databases, Data Mining, Business Intelligence, GTC 2013 - ID S3513
Streaming:
Download:
 
Benoit Deschamps (PSA Peugeot Citroen)
The PSA PEUGEOT CITROÃÂN IT Department works on research & development about imaging solutions for automotive use cases. The main objective is to reduce the gap between a physical mockup and a virtual mockup. In order to achieve ...Read More

The PSA PEUGEOT CITROÃÂN IT Department works on research & development about imaging solutions for automotive use cases. The main objective is to reduce the gap between a physical mockup and a virtual mockup. In order to achieve this, PSA employs new imaging technology enabled by the latest GPU''s to combine simulation and photorealistic rendering in interactive sessions. In addition, PSA leverages GPU cloud computing to reduce the amount of time required for off-line rendering. Lastly, PSA works with life-size product reviews with 3D visual solutions that deliver the correct look and feel. This talk will discuss how PSA leverages various tools to achieve the above topics and their vision for the future of computer generated imagery to drive Automotive Design.

  Back
 
Keywords:
Manufacturing General, Automotive, GTC 2013 - ID S3514
Streaming:
Download:
 
Ken Sanders (Gensler)
Detailed digital simulation and analysis of buildings prior to construction - on the desktop, across the WAN and in the cloud - is an essential part of modern design practice. The value created for building owners and users, in terms of aestheti ...Read More

Detailed digital simulation and analysis of buildings prior to construction - on the desktop, across the WAN and in the cloud - is an essential part of modern design practice. The value created for building owners and users, in terms of aesthetic beauty, construction efficiency and sustainable performance, is extremely compelling. Ken Sanders, FAIA, will share multiple collaborative case studies from Gensler''s global design portfolio, including new headquarter buildings for PNC and NVIDIA, as well as Shanghai Tower, which will be China''s tallest building when completed in 2015.

  Back
 
Keywords:
Manufacturing General, Cloud Visualization, Computer Aided Design, Desktop & Application Virtualization, GTC 2013 - ID S3517
Streaming:
Download:
 
Alain Gonzalez (PSA Peugeot Citroen)
PSA discusses the benefits and challenges that they faced in building a virtualized platform for their CAD designers involved in the Automotive Design process. Over 500 users, including employees, partners, sub-contractors over numerous internat ...Read More

PSA discusses the benefits and challenges that they faced in building a virtualized platform for their CAD designers involved in the Automotive Design process. Over 500 users, including employees, partners, sub-contractors over numerous international locations had to be connected together and supported through a single remote solution. Audience members will learn about the various steps that were employed to build out their solution, specific hardware and software that was utilized, and key metrics that they monitored to evaluate their ROI.

  Back
 
Keywords:
Manufacturing General, Automotive, Remote Graphics & Cloud-Based Graphics, GTC 2013 - ID S3537
Streaming:
Download:
 
Ray Browell (ANSYS)
High Performance Computing has been a mainstay of increased productivity for years now. But recently, GPUs have enabled another level of performance without the significant purchase and power consumption required by additional nodes. ANSYS, Inc. ...Read More

High Performance Computing has been a mainstay of increased productivity for years now. But recently, GPUs have enabled another level of performance without the significant purchase and power consumption required by additional nodes. ANSYS, Inc. continues to develop customer focused HPC solutions incorporating the latest hardware technologies including NVIDIA GPUs.

  Back
 
Keywords:
Manufacturing General, GTC 2013 - ID S3546
Streaming:
Download:
 
Trak Lord (Metaio, Inc.)
Metaio is a leading provider in Augmented Reality technologies and solutions. This talk will provide a quick intro to mobile AR and the value of using Tegra enabled devices to drive the experience. In addition, the talk will focus on specific ap ...Read More

Metaio is a leading provider in Augmented Reality technologies and solutions. This talk will provide a quick intro to mobile AR and the value of using Tegra enabled devices to drive the experience. In addition, the talk will focus on specific applications that are being used by the manufacturing industry to improve sales and customer loyalty. Examples include apps that enhance auto maintenance (Audi example), drive furniture sales (IKEA catalogue app), increase air conditioning sales (Mitsubishi AC placement app), and place kitchen/bathroom fixtures (Kohler app) in the context of one''s home.

  Back
 
Keywords:
Manufacturing General, Mobile Summit, GTC 2013 - ID S3569
Streaming:
Download:
Media & Entertainment
Presentation
Media
Isaac Rudomin (Barcelona Supercomputing Center)
We will discuss several steps in the process for simulating and visualizing large and varied crowds, in real time, for consumer-level computers and graphic cards (GPUs). Animating varied crowds using a diversity of models and animations (assets) ...Read More

We will discuss several steps in the process for simulating and visualizing large and varied crowds, in real time, for consumer-level computers and graphic cards (GPUs). Animating varied crowds using a diversity of models and animations (assets) is complex and costly. One needs models that are expensive if bought, take a long time to model, and consume too much memory and computing resources. We have developed methods for generating, simulating and animating crowds of varied aspect and a diversity of behaviors. Efficient simulations run in low cost systems because we use the power of modern programmable GPUs. One can apply similar technology using GPU clusters and HPC for large scale problems. Such systems scale up almost linearly by using multiple GPUs.

  Back
 
Keywords:
Media & Entertainment, Real-Time Graphics Applications, Rendering & Animation, Visual Effects & Simulation, GTC 2013 - ID S30020
Streaming:
Download:
 
Andrew Page (NVIDIA), Gerhard Lang (vizirt), Thomas True (NVIDIA), Shailendra Mathur (Avid)
Have questions, concerns or thoughts about the direction of GPU-based video and image processing? Join NVIDIA experts and partners for a lively discussion of such topics as application design, multi-GPU architecture, data movement, threading, AP ...Read More

Have questions, concerns or thoughts about the direction of GPU-based video and image processing? Join NVIDIA experts and partners for a lively discussion of such topics as application design, multi-GPU architecture, data movement, threading, APIs, and color management as they apply to Video and Image processing applications.

  Back
 
Keywords:
Media & Entertainment, Media & Entertainment, Real-Time Graphics Applications, Real-Time Graphics Applications, GTC 2013 - ID S3048
Streaming:
Download:
 
Atul Ravindran (Digimetrics)
Learn how CUDA makes television and film archive automated QC faster and more accurate. This talk addresses the methodologies and implementations of pattern-based and perceptual visual artifact detection algorithms, achieving high performance ru ...Read More

Learn how CUDA makes television and film archive automated QC faster and more accurate. This talk addresses the methodologies and implementations of pattern-based and perceptual visual artifact detection algorithms, achieving high performance runtimes, hit rates and low incidence of false positives. Learn how algorithms designed and proved on CPU architecture port to CUDA, and discover how only CUDA acceleration makes these solutions viable.

  Back
 
Keywords:
Media & Entertainment, Algorithms & Numerical Techniques, Video & Image Processing, GTC 2013 - ID S3099
Streaming:
Download:
 
Dirk Van Gelder (Pixar Animation Studios)
This session will introduce the OpenSubdiv API, a GPU accelerated framework for real-time computation and display of subdivision surfaces. It will illustrate key features such as semi-sharp creases, hierarchical edits and PTex texturing and how ...Read More

This session will introduce the OpenSubdiv API, a GPU accelerated framework for real-time computation and display of subdivision surfaces. It will illustrate key features such as semi-sharp creases, hierarchical edits and PTex texturing and how they can be leveraged through a custom shading pipeline in order to draw and animate a variety of extremely detailed surfaces. High-level code examples will be given in C++ and GLSL. It is assumed that registrants are familiar with compute (CUDA, CL) and rendering (GL / DX) pipelines. This is the same code that Pixar uses internally for animated film production. http://graphics.pixar.com/opensubdiv

  Back
 
Keywords:
Media & Entertainment, Real-Time Graphics Applications, Rendering & Animation, GTC 2013 - ID S3109
Streaming:
Download:
 
Francesco Giordana (Double Negative VFX)
Learn how you can use GPU acceleration to achieve interactive previewing of complex systems, such as hair and fur, in a VFX production environment. This talk presents a fully procedural and node based tool, in which each node of the graph can ha ...Read More

Learn how you can use GPU acceleration to achieve interactive previewing of complex systems, such as hair and fur, in a VFX production environment. This talk presents a fully procedural and node based tool, in which each node of the graph can have a GPU accelerated implementation. The framework is capable of falling back automatically to CPU computation in case of error or missing GPU implementation at any point in the graph. Fur is an excellent application domain, given the very high number of primitives that can be evaluated independently at each step of the computation.

  Back
 
Keywords:
Media & Entertainment, Real-Time Broadcast Graphics, Visual Effects & Simulation, GTC 2013 - ID S3179
Streaming:
Download:
 
Dan Bailey (Double Negative)
Explore how to design a domain-specific language to abstract away the complexities of simulating huge numbers of particles on CPU and GPU architectures without compromising on performance. This talk will present a particle-based FLIP fluid solve ...Read More

Explore how to design a domain-specific language to abstract away the complexities of simulating huge numbers of particles on CPU and GPU architectures without compromising on performance. This talk will present a particle-based FLIP fluid solver using Jet, a domain-specific language and compiler built on the LLVM compiler framework to target the CPU using LLVM's X86 backend and the GPU using the NVIDIA PTX backend. We discuss the design of the particle data model and demonstrate how a high-level representation of the algorithms can be successfully interpreted to maximise features of the differing architectures.

  Back
 
Keywords:
Media & Entertainment, Parallel Programming Languages & Compilers, Visual Effects & Simulation, GTC 2013 - ID S3224
Streaming:
Download:
 
Edwin Braun (cebas Visual Technology Inc.)
Every CUDA based renderer seems to be the same. How does this affect image quality and workflow in a movie production? This talk will explain why we think that a novel approach to GPU rendering inside of 3ds Max delivers better quality rendering ...Read More

Every CUDA based renderer seems to be the same. How does this affect image quality and workflow in a movie production? This talk will explain why we think that a novel approach to GPU rendering inside of 3ds Max delivers better quality renderings and keeps full creativity intact. Integrating the latest variant of Physically Based Microfacet rendering in CUDA, allows us to satisfy even the most demanding artists working in Hollywood. Movie Quality CUDA renderers have to conquer demanding tasks like outputting Render Elements after each rendering pass.

  Back
 
Keywords:
Media & Entertainment, Manufacturing Technical, Ray Tracing, Visual Effects & Simulation, GTC 2013 - ID S3354
Streaming:
Download:
 
Mark Davey (The Foundry)
The Foundry develops software for the visual effects industry. Our customers often have GPUs with impressive compute capabilities that historically have not been fully utilised by all of our products. However, GPUs are not always available and s ...Read More

The Foundry develops software for the visual effects industry. Our customers often have GPUs with impressive compute capabilities that historically have not been fully utilised by all of our products. However, GPUs are not always available and so we require a CPU fall-back for our algorithms. Due to this, the use of a GPU-centric language is not appropriate. The solution has been to create a rapid image processing (RIP) framework in which developers specify algorithms in a C++ like language. These are then automatically translated to optimised code for both GPU and CPU devices. This "write-once" approach enables to target both existing and new GPU hardware with minimal extra development effort. The algorithms typically run 10 to 100 times faster when translated for NVIDIA GPUs compared to translations for CPUs. Performance measurements will be presented for a number of image processing algorithms across a range of hardware platforms.

  Back
 
Keywords:
Media & Entertainment, Digital Post-Production, Video & Image Processing, GTC 2013 - ID S3360
Streaming:
Download:
 
Amit Gulati (Dolby Laboratories)
For glasses-free 3D to manifest comfortably, devices will need to drive multi-view displays.  Comfort is the holy grail of 3D and Dolby 3D holds the high bar on a glasses-free experience free of restrictions in viewing and annoying artifact ...Read More

For glasses-free 3D to manifest comfortably, devices will need to drive multi-view displays.  Comfort is the holy grail of 3D and Dolby 3D holds the high bar on a glasses-free experience free of restrictions in viewing and annoying artifacts.  But in order to achieve this SoC's will need to leverage GPU throughput for the complex feat of automatically converting stereo sources into multi-view renderings. Join this session for an in-depth look at processing requirements and potential architectures that will evolve to furnish the requisite 3D experience that will hasten 3D into the consumer mainstream. 

  Back
 
Keywords:
Media & Entertainment, Development Tools & Libraries, Video & Image Processing, GTC 2013 - ID S3372
Streaming:
Download:
 
Sam Blackman (Elemental Technologies)
Over the last 2 1/2 years, Elemental harnessed the power of graphics processors to become the dominant video processing platform behind the live streaming of the largest events in the sports industry. With early deployments in regional events cu ...Read More

Over the last 2 1/2 years, Elemental harnessed the power of graphics processors to become the dominant video processing platform behind the live streaming of the largest events in the sports industry. With early deployments in regional events culminating with live streaming of the 2012 Olympics in more than 70+ countries, Elemental technology, built on the power of NVIDIA GPUs, is quickly becoming the de facto standard for multiscreen video processing. This session will discuss how massive parallelism applies in the video processing market with specific case analysis devotes to sports streaming to mobile phones, tablets, PCs, and other smart devices.

  Back
 
Keywords:
Media & Entertainment, Real-Time Broadcast Graphics, Video & Image Processing, GTC 2013 - ID S3378
Streaming:
Download:
 
Keith Slavin (isovideo LLC)
A "future proof" approach to film production: separate the creative product from the distribution! 1. Film at the highest affordable frame rate (e.g. 120 frames/sec), and with short apertures. This "original" is not intended ...Read More

A "future proof" approach to film production: separate the creative product from the distribution! 1. Film at the highest affordable frame rate (e.g. 120 frames/sec), and with short apertures. This "original" is not intended for direct viewing. 2. As part of production, convert the original to multiple targets (e.g. 24, 48, 60). Conversion uses digital frame interpolation, followed by motion blur. Higher frame rates reduce motion per frame, so reducing occlusion and motion aliasing artifacts. Short apertures give sharper images, improving motion tracking. This approach was impractical because the cameras did not exist, and conversion introduced artifacts and was too slow for experimentation. New algorithms have been developed for motion-compensated frame interpolation on GPUs. The Legato frame rate converter achieves about 100 interpolated frames/second (without blur) using two GTX 580 cards. Blur is an oversampling/averaging process. Oversampling occurs to around 300 frames/sec. Averaging is achieved without significantly impacting parallelism.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3416
Streaming:
Download:
 
Timothy Huber (Theory Consulting LLC)
Join Timothy Huber, systems architect and consultant for Trumbull Ventures, as he describes Douglas Trumbull's HFR Near-time Review and editorial pipeline. Learn how NVIDIA GPUs enable Douglas Trumbull's Near-time review system to playba ...Read More

Join Timothy Huber, systems architect and consultant for Trumbull Ventures, as he describes Douglas Trumbull's HFR Near-time Review and editorial pipeline. Learn how NVIDIA GPUs enable Douglas Trumbull's Near-time review system to playback footage captured on a virtual set to be played back at full resolution.  The workflow entails quickly extracting a matte from the green screen RAW image sequence,  then use the camera motion data to composite the foreground motion  locked to the background plate, render and playback on a reference grade stereoscopic projector within 5-10 minutes of capture.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3423
Download:
 
David McClure (MTI Film)
Join David McClure, MTI Film''s VP of Product Development, as he discusses the unique challenges presented in feature film and television production and the solutions MTI Film offers, built on NVIDIA GPUs with CUDA. MTI Film has decades ...Read More

Join David McClure, MTI Film''s VP of Product Development, as he discusses the unique challenges presented in feature film and television production and the solutions MTI Film offers, built on NVIDIA GPUs with CUDA. MTI Film has decades of experience developing and supporting products for post production, image processing and dailies. Today''s file-based workflows for film, television and commercial productions require an incredible amount of image processing and transcoding to get media delivered in the necessary formats to various parties involved in every step of the process. The amount of data being captured on-set grows each year as productions shoot more, frame resolutions increase, and high frame rate recording experiments continue. Every captured frame must be color balanced, logged, synchronized with audio and delivered as soon as possible to producers, studio executives and the editors who begin cutting scenes together as soon as they receive on the material. As the name may suggest, these ''dailies'' are processed the same day they are shot. Dave will share the experiences of his team during the development of the CUDA-based image-processing core in their new Cortex product line designed to address these new challenges.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3430
Streaming:
Download:
 
Eliot Mack (Lightcraft Technology LLC)
Join Lightcraft Technology co-founder Eliot Mack as explains the development of real time visual effects from its origin as a data recording tool to its current use for high volume in-camera finishing with the Previzion virtual studio system. Un ...Read More

Join Lightcraft Technology co-founder Eliot Mack as explains the development of real time visual effects from its origin as a data recording tool to its current use for high volume in-camera finishing with the Previzion virtual studio system. Understand the original problems found in post-production based keying and tracking tasks, how Lightcraft fused multiple sensing and processing technologies together to solve them in real time, and how the resulting VFX acceleration changed the production world behind the scenes of such shows as ''Alice in Wonderland'', ''Once Upon a Time'', ''Pan Am'', and ''V''. Learn the surprises encountered as they unraveled the secrets of greenscreen extraction, lens calibration, and worldwide VFX production.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3436
Streaming:
Download:
 
Douglas Trumbull (Douglas Trumbull)
Douglas Trumbull, visionary film maker and visual effects pioneer, presents his compelling vision of how high-resolution, high-frame-rate stereo production will help us reach what he sees as the "holy grail", movies that offers a profo ...Read More

Douglas Trumbull, visionary film maker and visual effects pioneer, presents his compelling vision of how high-resolution, high-frame-rate stereo production will help us reach what he sees as the "holy grail", movies that offers a profound and overwhelming personal experience, not one of empathizing with actors via a third-person voyeurism, but a direct first-person experience where each audience member feels that they are inside the movie participating in the movie, not just looking at the movie. He explores the impacts that high frame rates and stereo presents on the entire movie system, from production camera capture, through post-production processes, through distribution, and up onto the big screen.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3437
Streaming:
Download:
 
Paul Lacombe (Unreel)
Paul Lacombe of UNREEL presents the work that he and his team have done on the sets of feature motion picture productions using immersive 3D graphics to give actors, directors and DPs a vision for how a final film will look while all they can se ...Read More

Paul Lacombe of UNREEL presents the work that he and his team have done on the sets of feature motion picture productions using immersive 3D graphics to give actors, directors and DPs a vision for how a final film will look while all they can see is a green screen and a few props. He illustrates his talk with examples from his work on "OZ: the Great and Powerful" (still in production), the award winning "Hugo", "Alice in Wonderland" and "Speed Racer". In addition, Lacombe will present his work with broadcast virtual sets and augmented reality using examples from ESPN Sport Center, NBA and NASCAR. He also describes how they are combining motion analysis and real-time graphics at CNBC for on-screen talent to drive interactive graphics as in "Minority Report".

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3438
Streaming:
Download:
 
Gerhard Lang (Vizrt)
Learn how GPU''s not only changed the look of broadcast graphics, but also how they enabled real-time compositing of graphics and media. Join us for an outlook on how GPUs can play an even more important role as content delivery shifts f ...Read More

Learn how GPU''s not only changed the look of broadcast graphics, but also how they enabled real-time compositing of graphics and media. Join us for an outlook on how GPUs can play an even more important role as content delivery shifts from traditional distribution to IP-based delivery. Review how data-driven graphics make images understandable and why the time from receiving data until it is visible is crucial for sports productions. Understand the challenges of reducing the latency of transporting media through a GPU-powered system while freeing GPU resources to generate compelling graphics. Glimpse how vizrt tackles the challenges of changing parameters in stereoscopic productions. See chroma and luminance keying on tracked elements providing keying zones for non-homogeneous environments like horse race and motor sport tracks. Review case studies from fast sports like Red Bull Stratos record-setting parachute jump.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3439
Streaming:
Download:
 
Tim Heidmann (Serious Intent LLC)
The America''s Cup sailing races, the oldest trophy in international sport, will take place in San Francisco in the summer of 2013. While 72-foot catamarans sailing at over 40 knots is a spectacular sight, it is often difficult for broad ...Read More

The America''s Cup sailing races, the oldest trophy in international sport, will take place in San Francisco in the summer of 2013. While 72-foot catamarans sailing at over 40 knots is a spectacular sight, it is often difficult for broadcasters to explain to the international television audience exactly what is going on. To address this challenge, the hosts of this event have developed the AC LivelineâÃÂâ system, which tracks the locations of the boats and the cameras, then radios that information back to shore, where it is used to create live overlay graphics that perfectly match the video from the camera. The graphics label the boats, show course boundaries, highlight the turning marks, measure the leads, and show current and wind variations. The creators were awarded the 2011 Emmy for Technical Achievement. This session describes the system, and how GPUs are used to perform several key tasks in the pipeline, from image tracking to video processing to numeric simulation to compositing and display of the broadcast signal.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3441
Streaming:
Download:
 
Jules Urbach (OTOY Inc. and LightStage)
OTOY has built a production pipeline from high quality content creation to delivery of photorealistic graphics in real-time. In this session, attendees will get a peek at the future of real-time visual effects and game graphics production delive ...Read More

OTOY has built a production pipeline from high quality content creation to delivery of photorealistic graphics in real-time. In this session, attendees will get a peek at the future of real-time visual effects and game graphics production delivered by OTOY''s cloud service using NVIDIA GPUs. The process starts by creating lifelike 3D assets with LightStage, a state-of-the-art capture environment used in blockbuster movies such as The Curious Case of Benjamin Button, Avatar and The Social Network, which produces 100% photorealistic 3D representations of actors. The second stage is turning these massive datasets into renderable 3D assets for use in film VFX and animation. Otoy''s physically-based, unbiased Octane Render uses GPUs to deliver real-time, final-quality previews to artists and TDs, allowing them to set cameras, lights and materials without time consuming iterations. The final pipeline stage provides real-time photorealistic games in the cloud with Brigade, a path tracing renderer targeted at games with numerous optimizations to run at real-time frame rates. It bridges the gap between offline and real-time rendering by using the same high quality assets and rendering algorithms (path tracing), offering photorealistic global illumination, and physically based materials in games.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3442
Streaming:
Download:
 
Vladimir "Vlado" Koylazov (Chaos Software Ltd.)
Join Chaos Software cofounder and Chief Technology Officer, Vladimir Koylazov as he provides an in-depth look at the architecture and software framework used to distribute interactive GPU rendering in V-Ray RT. Understand how Chaos leverages the ...Read More

Join Chaos Software cofounder and Chief Technology Officer, Vladimir Koylazov as he provides an in-depth look at the architecture and software framework used to distribute interactive GPU rendering in V-Ray RT. Understand how Chaos leverages the power of GPU computing within a number of 3D applications, including Autodesk 3ds Max and Maya, and discover how the speed and interactivity of GPU raytracing is enhancing 3D artists'' workflows, resulting in faster iterations and high quality output.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3450
Streaming:
Download:
 
Marc Leidy (Lightdog Films), Jascha Wetzel (Jawset Visual Computing)
We present a GPU-based pipeline for gaseous fluid simulation and volumetric rendering from both an engineer's and an artist's perspective. Using real-world visual effects examples, we discuss how this pipeline affects the day-to-day work ...Read More

We present a GPU-based pipeline for gaseous fluid simulation and volumetric rendering from both an engineer's and an artist's perspective. Using real-world visual effects examples, we discuss how this pipeline affects the day-to-day workflow for film and television visual effects production. Through insights gleaned from years of artist/engineer collaboration on film and television projects, we review how the GPU-based pipeline shortens the learning curve to master the many simulation parameters, how it allows artists to experience fluid dynamics in an artistically intuitive way, and how the iterative workflow becomes more efficient by performing all computation from geometry inputs to final quality rendering on the GPU.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3458
Streaming:
Download:
 
John Pallett (Telestream, Inc.)
What happens when you try to accelerate the best of the best? Seeking to provide its clients with the highest quality, lowest bit rate video compression, in 2012 Telestream released a commercial implementation of the X.264 codec which experts ge ...Read More

What happens when you try to accelerate the best of the best? Seeking to provide its clients with the highest quality, lowest bit rate video compression, in 2012 Telestream released a commercial implementation of the X.264 codec which experts generally regard as the best H.264 compression algorithm in the world. Catering to the requirements of the streaming media industry for the rapid creation of multi bit rate, multi format content packages, Telestream designed a computing platform which includes NVIDIA GPU''s as the cornerstone of accelerating this process. Join video industry expert John Pallett as he shares Telestream''s experiences with this effort. John will discuss the challenges involved with accelerating H.264 video compression, describe how it was achieved, as well as sharing key insights and performance benchmarks for the overall process.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3463
Streaming:
Download:
 
Jan Tomanek (Art And Animation Studio ltd. (AAA studio))
This session describes the production pipeline used to create an 85 minute CGI film - Goat Story 2 - with every final frame rendered on NVIDIA GPUs using FurryBall, a GPU renderer developed in house. Nowadays, FurryBall 3 includes most features ...Read More

This session describes the production pipeline used to create an 85 minute CGI film - Goat Story 2 - with every final frame rendered on NVIDIA GPUs using FurryBall, a GPU renderer developed in house. Nowadays, FurryBall 3 includes most features required by high-end movie and VFX productions including indirect lighting, displacement, fur, SSS and many other advanced features. While initially a in-house tool, starting in 2010 FurryBall was commercially licensed and is now used by many movie and game studios around the world. FurryBall is about 30 - 100 times faster than CPU ray-tracers and about 3 - 20 times faster than CPU RenderMan compliance on regular GPUs. More information at: http://furryball.aaa-studio.eu/

  Back
 
Keywords:
Media & Entertainment, Rendering & Animation, GTC 2013 - ID S3469
Streaming:
Download:
 
Nathan Cournia (Rhythm and Hues Studios)
Crom, Core Rhythm Operating Machine, is Rhythm and Hues Studios''software platform for efficiently developing and deploying tools and workflows tailored towards R+H''s production needs. In this session, we cover its CPU/GPU hybri ...Read More

Crom, Core Rhythm Operating Machine, is Rhythm and Hues Studios''software platform for efficiently developing and deploying tools and workflows tailored towards R+H''s production needs. In this session, we cover its CPU/GPU hybrid compositor: a fast, flexible, and efficient image processing framework. At its top level, the compositor provides users with node-based controls for generating and manipulating images which the underlying image processing engine analyzes and translates into GLSL or multi-core CPU code. The compositor is capable of processing arbitrarily sized compositing networks which require input data greatly exceeding the memory limitations of modern GPUs. Crom provides both a high-level API and an intuitive user interface that allow both programmers and users to define novel image processing nodes without concern for the details of programming the GPU or CPU. In addition to giving an overview of the internals of this process, we provide insights into the system''s design decisions and present problems and solutions encountered during development.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3476
Streaming:
Download:
 
Laurence Emms (Pixar Animation Studios)
Join Pixar Global Technology programmer Laurence Emms as he discusses how Pixar employs General Purpose GPU software in its production pipeline. This session will start with an overview of how Pixar has used GPUs to speed up lighting, set dressi ...Read More

Join Pixar Global Technology programmer Laurence Emms as he discusses how Pixar employs General Purpose GPU software in its production pipeline. This session will start with an overview of how Pixar has used GPUs to speed up lighting, set dressing, and character grooming pipelines in films including Cars, Brave and Monsters University. This will be followed by a discussion of how features of the modern OpenGL pipeline can be used to generate accurate real-time previews of film quality assets, such as vegetation, hair and fur proxies in Maya and Pixar''s in-house animation tool Presto. Finally, there will be a more technical explanation of methods for translating existing CPU code to run in CUDA, including an open-source mass-spring simulation example which runs entirely on the GPU.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3477
Streaming:
Download:
 
Lance Maurer (Cinnafilm Inc.)
CEO and founder Lance Maurer will discuss Pixel Strings, the scalable, real-time, GPU-accelerated motion vector engine that drives Cinnafilm's two flagship products: Dark Energy and Tachyon. Dark Energy is used to create high quality th ...Read More

CEO and founder Lance Maurer will discuss Pixel Strings, the scalable, real-time, GPU-accelerated motion vector engine that drives Cinnafilm's two flagship products: Dark Energy and Tachyon. Dark Energy is used to create high quality theatrical and HD content by removing noise and film grain without harming image detail and by simulating new film grain structure.  Tachyon, a faster-than-real-time plugin for many third-party transcoding systems, is used to automatically correct broken cadence and deliver optimum format and framerate conversions, ensuring higher quality output than with standard transcode solutions alone. Tachyon received the HPA Engineering Excellence Award for 2012. Lance will also discuss some of his industry-trend predictions for the ever-changing market of multi-media delivery in the years to come.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3479
Streaming:
Download:
 
Nicholas Mark Recagno (Nicholas Mark Recagno)
Join us as we review HFR production issues ranging from how camera shutter speeds affect down-conversion quality to the real world challenges faced by post houses wanting to play media reliably. We will journey through the post production pipeli ...Read More

Join us as we review HFR production issues ranging from how camera shutter speeds affect down-conversion quality to the real world challenges faced by post houses wanting to play media reliably. We will journey through the post production pipeline, discussing editorial standards, dailies processes, image and sound sync, QC, stereo fixing, conform, and final playout to normal 3DS TVs, monitors and DCI-compliant projectors.Business realities often require HFR productions to be down converted to standard frame rates to broaden distribution. Getting your acquisition strategy right and understanding the tools that enable frame rate conversion via motion interpolation (to simulate motion blur necessary to remove unwanted strobing effects) can be key to the technical success of a production. We will discuss how GPU-accelerated motion estimation technology in SGO Mistika software has been used in the post-production pipelines of several blockbuster movies, providing up to an 83% reduction in render render times compared to CPU-based implementations.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3483
Streaming:
Download:
 
Mark Muench (ESPN), Christopher Pond (ESPN)
ESPN''s Emerging Technology team will present on using NVIDIA''s GPU architecture to develop High Resolution Imagery to enhance the fans'' viewing experience. During the session, attendees will learn how ESPN has develope ...Read More

ESPN''s Emerging Technology team will present on using NVIDIA''s GPU architecture to develop High Resolution Imagery to enhance the fans'' viewing experience. During the session, attendees will learn how ESPN has developed a GPU-based architecture that allows use of 4K video in telecasts. The team will address High Resolution Imagery challenges when applying multiple GPU processing, theoretical vs. practicaltransfer rates, gpuDirect, bandwidth, bus speed, and playback, together with coding for Open GL, NPP, CUDA, and Open CL processing.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3487
Streaming:
Download:
 
Shailendra Mathur (Avid)
Recently Avid introduced an innovative Stereo 3D toolset that allows editors to smoothly transition their monoscopic storytelling skills to the stereoscopic format while preserving familiar editing and data management functions, operational spee ...Read More

Recently Avid introduced an innovative Stereo 3D toolset that allows editors to smoothly transition their monoscopic storytelling skills to the stereoscopic format while preserving familiar editing and data management functions, operational speed, and image quality that users expect. While preservation of the editing experience was an important goal, the underlying architecture was built to treat this new format as a step towards a world with higher resolutions, frame rates, bits depths and color ranges, as well as multi-view environments. In this presentation you will learn how this architecture was inspired by a past that is storied in data management innovations such as multi-cam, and a future that will see light-field captures becoming the norm for storytelling. You will also learn how the Avid Intelligent Compute architecture was used to accelerate codecs and effects processing in the S3D workflows using a combination of CPUs, NVIDIA GPUs and FPGA based hardware.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3499
Streaming:
Download:
 
Enzo Guerrera (Adobe Systems Inc.)
This session will describe how Adobe''s Anywhere platform utilizes GPUs to optimize the editing experience of creative professionals connected across IP networks. Learn how Adobe''s Mercury Streaming Engine running on servers wit ...Read More

This session will describe how Adobe''s Anywhere platform utilizes GPUs to optimize the editing experience of creative professionals connected across IP networks. Learn how Adobe''s Mercury Streaming Engine running on servers with GPUs can render video frames from any sequence or video source on behalf of clients, and how the resulting frames are compressed into a format that optimizes the experience for each connected client based on that client''s specific requirements of quality, latency requirements and network bandwidth.

  Back
 
Keywords:
Media & Entertainment, GTC 2013 - ID S3547
Streaming:
Download:
Medical Imaging & Visualization
Presentation
Media
Ismayil Guracar (Siemens Medical Solutions, Ultrasound Business Unit)
See how real time CUDA-based signal processing has been applied to high data rate signal processing pipeline stages of the ACUSON SC2000(TM) diagnostic medical ultrasound imaging system. In addition to replacing FPGA and CPU based processing fun ...Read More

See how real time CUDA-based signal processing has been applied to high data rate signal processing pipeline stages of the ACUSON SC2000(TM) diagnostic medical ultrasound imaging system. In addition to replacing FPGA and CPU based processing functions we have been able to rapidly extend the capabilities of this system to achieve improved speckle reduction, image information content and lesion conspicuity. These processing stages co-exist seamlessly with real time OpenGL based 2D and 3D reconstruction in a Windows based system. The continuing advancement of GPU computing performance will enable further exciting developments in medical imaging.

  Back
 
Keywords:
Medical Imaging & Visualization, Signal Processing, GTC 2013 - ID S3172
Streaming:
Download:
 
Sebastian Schaetz (Biomedizinische NMR Forschungs GmbH am Max-Planck-Institut fur biophysikalische Chemie), Martin Uecker (Electrical Engineering and Computer Sciences, University of California, Berkeley)
Learn from our experience with implementing complex iterative medical image reconstruction algorithms for real-time MRI on multi-GPU systems. Latest findings in MRI image reconstruction techniques allow radiologists to move from still images to ...Read More

Learn from our experience with implementing complex iterative medical image reconstruction algorithms for real-time MRI on multi-GPU systems. Latest findings in MRI image reconstruction techniques allow radiologists to move from still images to videos with up to 50 images per second. Filming dynamic processes at such frame rates creates vast new possibilities for better and more efficient diagnoses. Algorithm complexity, high data rate and low latency requirements pose extreme challenges for image reconstruction implementations. In this talk we will demonstrate the steps we undertook to optimize a non-linear inversion based algorithm for online operation implemented on a single-node multi-GPU system. We will further introduce a C++ programming library for implementing real-time algorithms on multi-GPU systems that features MPI like communication routines and distributed memory container abstractions.

  Back
 
Keywords:
Medical Imaging & Visualization, Signal Processing, GTC 2013 - ID S3236
Streaming:
Download:
 
Won-Ki Jeong (Ulsan National Institute of Science and Technology), Tran Minh Quan (Ulsan National Institute of Science and Technology)
Magnetic resonance imaging (MRI) is a widely used in-vivo imaging technique essential for diagnosis of the disease, but its slow acquisition speed prevents wide applications of the technique. Recent advances in signal processing research, i.e., ...Read More

Magnetic resonance imaging (MRI) is a widely used in-vivo imaging technique essential for diagnosis of the disease, but its slow acquisition speed prevents wide applications of the technique. Recent advances in signal processing research, i.e., compressive sensing (CS), makes it possible to reconstruct the full resolution signal using a computational method only with a fraction of the original signal. Compressive sensing can be a promising technique to shorten the MRI acquisition process, but the challenging problem is to generate the image quickly by accelerating the compressive sensing reconstruction problem. In this talk, we will introduce how multi-GPU systems can be useful to accelerate the CS MRI reconstruction problem. We will discuss the implementation details using NVIDIA CUDA and OpenMP, and show some preliminary results on a GPU cluster system with 16 NVIDIA Tesla C2090 GPUs.

  Back
 
Keywords:
Medical Imaging & Visualization, Video & Image Processing, GTC 2013 - ID S3308
Streaming:
Download:
Mobile Summit
Presentation
Media
Chris Pedersen (NVIDIA)
This session will analyze various third-party application-tracking sources as well as NVIDIA proprietary research to highlight key trends in the mobile app market. Relative category sizes and growth rates will be described with an emphasis on hi ...Read More

This session will analyze various third-party application-tracking sources as well as NVIDIA proprietary research to highlight key trends in the mobile app market. Relative category sizes and growth rates will be described with an emphasis on highlighting emerging market opportunities that offer greater business opportunity to application developers. Differences in the kinds of customers drawn to different applications will also be summarized so app developers can gain insights that will help guide the design and marketing of mobile applications.

  Back
 
Keywords:
Mobile Summit, GTC 2013 - ID S3366
Streaming:
Download:
 
Michael Vakulenko (Vision Mobile Ltd)
The session will walk participants through the hidden economics of asymmetric business models of Apple, Google and Amazon, and have they have engineered their ecosystem empires. The workshop will explain the mechanics behind app ecosystems, incl ...Read More

The session will walk participants through the hidden economics of asymmetric business models of Apple, Google and Amazon, and have they have engineered their ecosystem empires. The workshop will explain the mechanics behind app ecosystems, including single/multi side network effects, communication vs application networks, acceleration, friction and dilution of network effects, external subsidies and lock-in properties. The talk will conclude with an open discussion of what are the best practices of engineering an ecosystem and what Apple, Google and Amazon need to improve.

  Back
 
Keywords:
Mobile Summit, GTC 2013 - ID S3390
Streaming:
Download:
 
David Thompson (Kitware Inc.)
Recent improvements in hardware and operating systems have enabled feature-rich games and interactive visualizations on the mobile platform. However, the lack of sustainable, general, and cross-platform frameworks prevents programmers and applic ...Read More

Recent improvements in hardware and operating systems have enabled feature-rich games and interactive visualizations on the mobile platform. However, the lack of sustainable, general, and cross-platform frameworks prevents programmers and application developers from taking full advantage of these many recent advances. The proposed presentation begins with requirements for a good framework for mobile visualization and what motivated us to create a new framework, followed by in-depth introduction to VES and Kiwi architecture, followed by a description of important implementation details.  The presentation includes a live demonstration of a steaming XBox Kinect generated and Point Cloud Library (PCL) processed point clouds to a mobile device. The presentation also describes how VES and Kiwi interfaces with some of other open source, scientific computing toolkits such as VTK, ParaViewWeb, and PCL; guidance towards developing real time visualization application on mobile devices; and approaches to addressing the challenge of developing on different operating systems such as Android and iOS. Since open source is an important aspect of VES and Kiwi framework, this talk also describes open source tools to access, manage, build, and test VES/Kiwi-based software in a comprehensive software development process. 

  Back
 
Keywords:
Mobile Summit, Real-Time Graphics Applications, Scientific Visualization, GTC 2013 - ID S3402
Download:
 
Shalini Gupta (NVIDIA)
Do you want to write your own blazing fast, interactive mobile apps using computer vision technology? Apps that can make your camera smarter, find people''s faces, understand their gestures, interpret scenes and augment them with graphic ...Read More

Do you want to write your own blazing fast, interactive mobile apps using computer vision technology? Apps that can make your camera smarter, find people''s faces, understand their gestures, interpret scenes and augment them with graphics. The Tegra super chip and the OpenCV for Tegra library can help you to do just that! OpenCV for Tegra is a highly optimized port of the OpenCV library for NVIDIA''s Tegra chip. It runs on Android. It has ~2500 image processing and computer vision functions. In this session, we will introduce you to OpenCV for Tegra, its functionality, and some things you can do with it. We''ll show you how to easily start developing with it by downloading the Tegra Android Development Pack. We''ll walk you through how to use OpenCV for Tegra in your Android apps. Finally, for published apps we''ll describe how the OpenCV Manager service can automatically manage OpenCV for Tegra libraries on end users''devices.

  Back
 
Keywords:
Mobile Summit, Computational Photography, Computer Vision, Development Tools & Libraries, GTC 2013 - ID S3411
Streaming:
Download:
 
Alejandro Troccoli (NVIDIA)
This session will talk about FCam, a camera control application programming interface (API) that is well suited for Mobile Computational Photography. FCam allows for simple and precise control of the camera system and enables the application dev ...Read More

This session will talk about FCam, a camera control application programming interface (API) that is well suited for Mobile Computational Photography. FCam allows for simple and precise control of the camera system and enables the application developer to replace basic camera routines such as metering, which are typically hidden inside black boxes in traditional camera programming models. The session will show some applications we have developed using FCam on our Tegra platform.

  Back
 
Keywords:
Mobile Summit, Computational Photography, Computer Vision, GTC 2013 - ID S3445
Streaming:
Download:
 
Neil Trevett (Khronos)
Discover how over 100 companies cooperate at the Khronos Group to create open, royalty free standards that help define the future of mobile silicon. This session explores the role of industry standards in maximizing mobile market opportunities a ...Read More

Discover how over 100 companies cooperate at the Khronos Group to create open, royalty free standards that help define the future of mobile silicon. This session explores the role of industry standards in maximizing mobile market opportunities and provides an overview of the state-of-the-art in acceleration APIs on Android and ARM-based systems including: accelerating time to productive ecosystems rather than minimizing time to proprietary specifications; balancing and reconciling the opposing benefits of "differentiation" and "fragmentation"; designing open standards that drive innovation while allowing room for a healthy competition; overview of Khronos ecosystem APIs for graphics, computing, media, sensor and vision processing; accelerating advanced applications such as Augmented Reality.

  Back
 
Keywords:
Mobile Summit, GTC 2013 - ID S3484
Streaming:
Download:
 
Jim Steele (Sensor Platforms)
Learn how sensors and intelligent system software can make smart devices even smarter. Examples include using sensor data to improve camera performance under different lighting conditions; save battery life by judiciously refreshing GPS and WiFi ...Read More

Learn how sensors and intelligent system software can make smart devices even smarter. Examples include using sensor data to improve camera performance under different lighting conditions; save battery life by judiciously refreshing GPS and WiFi locations; eliminate inadvertent pocket dialing; enable smart devices to help users find their cars; and much more.

  Back
 
Keywords:
Mobile Summit, GTC 2013 - ID S3485
Streaming:
Download:
 
Gideon Shmuel (eyeSight Mobile Technologies Ltd.)
During this session eyeSight''s CEO will discuss gesture recognition technology; where is it being implemented nowadays, and what to expect next. The session will also discuss integration of the touch-free technology on NVIDIA''s ...Read More

During this session eyeSight''s CEO will discuss gesture recognition technology; where is it being implemented nowadays, and what to expect next. The session will also discuss integration of the touch-free technology on NVIDIA''s Tegra and the advantages of such integration on GPU level.

  Back
 
Keywords:
Mobile Summit, GTC 2013 - ID S3486
Streaming:
Download:
 
Ira Dvir (Human-Monitoring)
The session will unveil the way hipix solves the issues of sharing full resolution, high-quality images as well as transferring and/or keeping gigantic files on tablets by enabling the shrinking of photo files on smart phones to less than 1/3 th ...Read More

The session will unveil the way hipix solves the issues of sharing full resolution, high-quality images as well as transferring and/or keeping gigantic files on tablets by enabling the shrinking of photo files on smart phones to less than 1/3 their mobile JPG size;  cutting magazines size to less than 1/2 their current size;  enhancing user experience by allowing user friendly addition of audio/text/contact and the like tags, which are embedded in the photo file; and improving PDFs by allowing the embedding of high resolution imaging in PDF documents. Hipix relies on the HW accelerated video CoDec (h.264 today, HEVC in the future) and is a "no brainer" since the "trick" is in the container. The presentation will include live demonstrations on a Tegra-based tablet. 

  Back
 
Keywords:
Mobile Summit, Computational Photography, Video & Image Processing, GTC 2013 - ID S3488
Download:
 
Stephen Jones (NVIDIA)
NVIDIA provides tools that help you get the most out of your Android application. Come learn how to minimize your time to market while maximizing stability and performance. This session will cover native Android GPU debugging and profiling tools ...Read More

NVIDIA provides tools that help you get the most out of your Android application. Come learn how to minimize your time to market while maximizing stability and performance. This session will cover native Android GPU debugging and profiling tools, CPU debugging and profiling tools, including Nsight Tegra, the premiere Android development for Microsoft Visual Studio.

  Back
 
Keywords:
Mobile Summit, GTC 2013 - ID S3489
Streaming:
Download:
 
Brian Cabral (NVIDIA)
This session will introduce the imaging architecture of the Tegra 4 processor that enables simultaneous use of the ISP, CPU and GPU for computational photography applications. The Always-on HDR implementation will be described as an example of h ...Read More

This session will introduce the imaging architecture of the Tegra 4 processor that enables simultaneous use of the ISP, CPU and GPU for computational photography applications. The Always-on HDR implementation will be described as an example of how these resources can be combined to deliver breakthrough imaging performance in mobile devices. Programming principles, APIs and reference documentation will also be provided to enable application developers who are interested in leveraging the architecture to create new computational photography applications.

  Back
 
Keywords:
Mobile Summit, Computational Photography, GTC 2013 - ID S3491
Streaming:
Download:
 
Chris Pedersen (NVIDIA), Don MacAskill (SmugMug)
Cell phone and tablet cameras are getting better and better as computation is being used to compensate for the limitations their small size place on mechanics and optics. The technology doesn''t matter though, until it''s include ...Read More

Cell phone and tablet cameras are getting better and better as computation is being used to compensate for the limitations their small size place on mechanics and optics. The technology doesn''t matter though, until it''s included in an application that delivers a great experience. In this 25 minute session, we''ll highlight how one application has tapped the power of Tegra 4 to deliver breakthrough mobile camera experiences. Come learn how simultaneous use of the ISP, CPU and GPU can enable new computational photography features, delivered simply, in ways that will delight consumers and lead to better pictures.

  Back
 
Keywords:
Mobile Summit, Computational Photography, GTC 2013 - ID S3492
Streaming:
Download:
 
Neil Trevett (NVIDIA)
Discover how NVIDIA desktop and mobile technologies will begin to bring the power of GPU computing to a wide range of devices and gain insights into the new classes of compute intensive mobile applications this will enable. ...Read More

Discover how NVIDIA desktop and mobile technologies will begin to bring the power of GPU computing to a wide range of devices and gain insights into the new classes of compute intensive mobile applications this will enable.

  Back
 
Keywords:
Mobile Summit, GTC 2013 - ID S3494
Streaming:
Download:
 
Graeme Finlayson (metaio, Inc.)
This session will talk about the Augmented Reality development platform from metaio. It will provide an overview of the development process and deployment to a mobile platform. It will cover an example application porting to a Tegra platform and ...Read More

This session will talk about the Augmented Reality development platform from metaio. It will provide an overview of the development process and deployment to a mobile platform. It will cover an example application porting to a Tegra platform and consider some of the new features of the Tegra 4 platform and look forward to some of the developments in open APIs from the perspective of an AR application developer.

  Back
 
Keywords:
Mobile Summit, Video & Image Processing, GTC 2013 - ID S3496
Streaming:
Download:
 
Chris Dalton (NVIDIA)
Discover how NVIDIA has optimized the Web experience on Android, including GPU acceleration of the HTML5 browser stack and leading-edge technologies such as WebGL. This session will explore the advantages of deploying Web applications and how Te ...Read More

Discover how NVIDIA has optimized the Web experience on Android, including GPU acceleration of the HTML5 browser stack and leading-edge technologies such as WebGL. This session will explore the advantages of deploying Web applications and how Tegra is creating a capable platform for the next generation of portable mobile experiences.

  Back
 
Keywords:
Mobile Summit, GTC 2013 - ID S3549
Streaming:
Download:
 
Andrew Edelsten (NVIDIA), Richard Seis (NVIDIA)
The NVIDIA Tegra processor has been at the cutting edge of mobile processor technology since its inception. During this session discover the new CPU, GPU, and multimedia features the Tegra 4 offers and learn how NVIDIA provides technical support ...Read More

The NVIDIA Tegra processor has been at the cutting edge of mobile processor technology since its inception. During this session discover the new CPU, GPU, and multimedia features the Tegra 4 offers and learn how NVIDIA provides technical support for the development of cutting edge mobile application and games. The speakers will also introduce Project SHIELD, NVIDIA''s new open gaming device. Project SHIELD integrates Tegra 4 into a highly innovative mobile game controller design to bring Android and PC games to the big screen. Learn some background about Project SHEILD''s creation, its unique Android customizations, and the hot tips and tricks when targeting Project SHIELD with your games.

  Back
 
Keywords:
Mobile Summit, Game Development, GTC 2013 - ID S3550
Streaming:
Download:
 
Paul Hodgson (NVIDIA)
NVIDIA''s Developer Technologies team assists software developers ship games and applications to market quicker, with better performance and with optimized user experiences. As releases approache, NVIDIA sees many developers hit similar ...Read More

NVIDIA''s Developer Technologies team assists software developers ship games and applications to market quicker, with better performance and with optimized user experiences. As releases approache, NVIDIA sees many developers hit similar problems. Attend the session to learn common 3D game performance issues games and how to arrive at an optimal solution in the Unity 3D engine.

  Back
 
Keywords:
Mobile Summit, Game Development, GTC 2013 - ID S3554
Streaming:
Download:
 
Bill Rehbock (NVIDIA)
Learn how NVIDIA''s TegraZone and NVISION online magazine are driving discoverability for Tegra-optimized applications in the Android Play Store, and routinely delivering 4x the conversion rate and 2x the ASP of standard Android games. H ...Read More

Learn how NVIDIA''s TegraZone and NVISION online magazine are driving discoverability for Tegra-optimized applications in the Android Play Store, and routinely delivering 4x the conversion rate and 2x the ASP of standard Android games. How can you leverage TegraZone to drives sales for your application?

  Back
 
Keywords:
Mobile Summit, GTC 2013 - ID S3570
Streaming:
Download:
 
Julie Uhrman (OUYA)
How can you plugin to the OUYA ecosystem? With passionate customers, cool technology and living room connectivity, the opportunity to create compelling experiences is yours. Come learn about the OUYA product and how to optimize applications to t ...Read More

How can you plugin to the OUYA ecosystem? With passionate customers, cool technology and living room connectivity, the opportunity to create compelling experiences is yours. Come learn about the OUYA product and how to optimize applications to take full advantage of the platform.

  Back
 
Keywords:
Mobile Summit, Game Development, GTC 2013 - ID S3572
Streaming:
Download:
 
Neil Trevett (NVIDIA), Graeme Finlayson (metaio, Inc.), Jim Steele, Ph.D. (Sensor Platforms), Tim Droz (SoftKinetic), Don MacAskill (SmugMug)
A lively and an interactive panel session where we bring together leading experts from NVIDIA and the industry to discuss how the mobile revolution is changing the world we live in and the key challenges that are yet to be solved. This is your c ...Read More

A lively and an interactive panel session where we bring together leading experts from NVIDIA and the industry to discuss how the mobile revolution is changing the world we live in and the key challenges that are yet to be solved. This is your chance to get your gnarliest mobile questions answered!

  Back
 
Keywords:
Mobile Summit, GTC 2013 - ID S3573
Streaming:
Download:
Parallel Programming Languages & Compilers
Presentation
Media
Cliff Woolley (NVIDIA)
Starting from the fundamentals of parallel programming in CUDA C/C++, learn how to maximize your development productivity. We present a design cycle we call APOD: Assess, Parallelize, Optimize, and Deploy, which helps application developers to r ...Read More

Starting from the fundamentals of parallel programming in CUDA C/C++, learn how to maximize your development productivity. We present a design cycle we call APOD: Assess, Parallelize, Optimize, and Deploy, which helps application developers to rapidly identify the portions of their code that would most readily benefit from GPU acceleration, rapidly realize that benefit, and begin leveraging the resulting speedups in production as early as possible.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3008
Streaming:
Download:
 
Rob Farber (BlackDog Endeavors, LLC)
We will study several examples taken from real applications, showing how to analyze and optimize the codes. The examples will cover the three main performance bottlenecks: memory bandwidth, computational throughput and latency. During the sessio ...Read More

We will study several examples taken from real applications, showing how to analyze and optimize the codes. The examples will cover the three main performance bottlenecks: memory bandwidth, computational throughput and latency. During the session we will use Nsight Visual Studio Edition to perform the analysis and identify limiters, and for each problem, we will gradually increase the technicality of the optimizations from basic to advanced.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Development Tools & Libraries, GTC 2013 - ID S3012
Streaming:
Download:
 
Martin Burtscher (Texas State University-San Marcos), Rupesh Nasre (The University of Texas at Austin)
This session presents general techniques for optimizing irregular GPU applications, i.e., programs whose control flow and memory accesses are data dependent and change dynamically. Such codes are difficult to accelerate because they tend to suff ...Read More

This session presents general techniques for optimizing irregular GPU applications, i.e., programs whose control flow and memory accesses are data dependent and change dynamically. Such codes are difficult to accelerate because they tend to suffer from thread divergence, load imbalance, little coalescing, and varying parallelism. Based on seven well-performing CUDA implementations, all of which operate on irregular graph-based data structures, we extract a number of common principles for improving synchronization, scheduling, memory access patterns and the algorithms. Results show these approaches often increase performance by one to two orders of magnitude, making them essential for accelerating irregular applications.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3016
Streaming:
Download:
 
Peter Messmer (NVIDIA)
OpenACC has quickly become the standard for accelerating large code bases with GPUs. Using directives, the programmer provides hints about data locality, data dependency and control flow that allows the compiler to automatically generate efficie ...Read More

OpenACC has quickly become the standard for accelerating large code bases with GPUs. Using directives, the programmer provides hints about data locality, data dependency and control flow that allows the compiler to automatically generate efficient GPU code. While the OpenACC model is well suited for a broad range of commonly encountered software patterns, it is sometimes necessary to fine-tune an application with advanced OpenACC directives or interface to an external CUDA code to take advantage of latest hardware features. The goal of this tutorial is to present the different strategies to tune OpenACC code and introduce mechanisms to interface OpenACC with other GPU code. Based on examples, we will first present different strategies to assess and optimize the performance of an OpenACC code, and will then focus on interfacing OpenACC code with CUDA and graphics libraries.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Supercomputing, GTC 2013 - ID S3019
Streaming:
Download:
 
David Luebke (NVIDIA)
We invite you to a special presentation detailing our Academic Programs and all the ways NVIDIA supports teaching and research in higher education.You will find out what programs are available, what benefits they have, what our expectations are, ...Read More

We invite you to a special presentation detailing our Academic Programs and all the ways NVIDIA supports teaching and research in higher education.You will find out what programs are available, what benefits they have, what our expectations are, who the key players are, best practices and how you can participate as an academic or researcher. The highlight of the session will be the CUDA Achievement Awards showcasing work at the CUDA Centers of Excellence. The CUDA Centers of Excellence (CCOEs) are institutions at the forefront of GPU computing teaching and research. If you are an academic researcher you won''t want to miss this session!

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3021
Streaming:
Download:
 
Mark Ebersole (NVIDIA)
Starting with a background in C or C++, learn everything you need to know to accelerate your applications using CUDA C/C++. Beginning with a "Hello, World" CUDA C program, explore parallel programming with CUDA through a number of easy ...Read More

Starting with a background in C or C++, learn everything you need to know to accelerate your applications using CUDA C/C++. Beginning with a "Hello, World" CUDA C program, explore parallel programming with CUDA through a number of easy to follow code examples. Examine more deeply the various APIs available to CUDA applications and learn the best ways in which to employ them in your applications.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3049
Streaming:
Download:
 
Greg Ruetsch (NVIDIA), Massimiliano Fatica (NVIDIA)
This tutorial covers various aspects of writing code in CUDA Fortran, the leading solution for GPU-accelerated Fortran applictions. Topics covered include a basic introduction to parallel programming concepts using CUDA, performance measurements ...Read More

This tutorial covers various aspects of writing code in CUDA Fortran, the leading solution for GPU-accelerated Fortran applictions. Topics covered include a basic introduction to parallel programming concepts using CUDA, performance measurements and metrics, optimization, and multi-GPU programming via MPI and peer-to-peer communication between GPUs. Several case studies will be presented as well.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3050
Streaming:
Download:
 
Daniel Egloff (QuantAlea GmbH), Xiang Zhang (QuantAlea GmbH)
CUDA and F# are two trailblazing yet unrelated technologies. F# is a uniquely productive language to solve complex problems in a clear and concise way. On the other hand CUDA is a platform for parallel high performance computing on GPUs. Our pre ...Read More

CUDA and F# are two trailblazing yet unrelated technologies. F# is a uniquely productive language to solve complex problems in a clear and concise way. On the other hand CUDA is a platform for parallel high performance computing on GPUs. Our presentation shows how to wed the two technologies F# and GPUs with the help of Alea. CUDA, a new framework and compiler infrastructure to develop GPU accelerated applications with F# on .NET. Alea.cuBase builds upon the LLVM compiler toolkit and the new NVIDIA PTX backend of CUDA 5. With Alea.cuBase you effectively write F# code, which generates CUDA programs dynamically at runtime, fully integrated into .NET. This approach opens up new dimensions for GPU programming, for example to develop non-trivial domain specific languages. Another interesting application is server side compilation of GPU kernels and GPU cloud computing. To give you a feeling of the new capabilities, we complement our presentation with several live coding examples.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Development Tools & Libraries, Finance, GTC 2013 - ID S3055
Streaming:
Download:
 
Phil Pratt-Szeliga (Syracuse University)
Rootbeer is a compiler/translator that allows the GPU to execute stylized Java programs. Unlike Java language bindings for OpenCL or CUDA which require manual programmer intervention, Rootbeer automatically serializes arbitrary graphs of composi ...Read More

Rootbeer is a compiler/translator that allows the GPU to execute stylized Java programs. Unlike Java language bindings for OpenCL or CUDA which require manual programmer intervention, Rootbeer automatically serializes arbitrary graphs of composite objects into a representation suited to the GPU. Rootbeer is well-tested, handles a broad range of the Java programming language, and is open sourced under a permissive MIT-style license. This talk will provide programmers with an introduction to the Rootbeer programming environment, as well as offer insight into the unique challenges encountered during Rootbeer''s development. Finally, performance comparisons between native Java, Rootbeer, and native CUDA will be discussed for a range of applications. his work was supported by National Science Foundation grant number MCB-0746066 to R.D.W.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Development Tools & Libraries, GTC 2013 - ID S3058
Streaming:
Download:
 
Jeff Larkin (NVIDIA)
OpenACC is an open programming standard for parallel computing on accelerators (including GPUs) using directives. It is designed to make the transformative power of heterogeneous computing systems available to the developer quickly and easily. I ...Read More

OpenACC is an open programming standard for parallel computing on accelerators (including GPUs) using directives. It is designed to make the transformative power of heterogeneous computing systems available to the developer quickly and easily. In this tutorial you will learn how to add simple directives to your code to expose parallelism to the compiler, allowing it to efficiently map computation onto an accelerator automatically. OpenACC allows developers to make simple and portable code changes, enabling an easier migration to accelerated computing.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Development Tools & Libraries, GTC 2013 - ID S3076
Streaming:
Download:
 
James Beyer (Cray Inc.)
This session will discuss the OpenACC specification and its implementation in the Cray Compilation Environment. Details of how the compiler maps OpenACC onto the NVIDIA hardware will be presented. Special extensions provided by CCE will be prese ...Read More

This session will discuss the OpenACC specification and its implementation in the Cray Compilation Environment. Details of how the compiler maps OpenACC onto the NVIDIA hardware will be presented. Special extensions provided by CCE will be presented with usage cases to illustrate the motive behind the extensions.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Supercomputing, GTC 2013 - ID S3084
Streaming:
Download:
 
Lars Nyland (NVIDIA), Stephen Jones (NVIDIA)
Atomic memory operations provide powerful communication and coordination capabilities for parallel programs, including the well-known operations compare-and-swap and fetch-and-add. The atomic operations enable the creation of parallel algorithms ...Read More

Atomic memory operations provide powerful communication and coordination capabilities for parallel programs, including the well-known operations compare-and-swap and fetch-and-add. The atomic operations enable the creation of parallel algorithms and data structures that would otherwise be very difficult (or impossible) to express without them - for example: shared parallel data structures, parallel data aggregation, and control primitives such as semaphores and mutexes. In this talk we will use examples to describe atomic operations, explain how they work, and discuss performance considerations and pitfalls when using them.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Algorithms & Numerical Techniques, GTC 2013 - ID S3101
Streaming:
Download:
 
Dan Rosanova (West Monroe Partners)
Learn how to program GPU from .NET languages like C# using a free open source framework: CUDAFY.NET. This session will teach the basics of GPU programming in the .NET environment including tools, techniques, and strategies to bring the power of ...Read More

Learn how to program GPU from .NET languages like C# using a free open source framework: CUDAFY.NET. This session will teach the basics of GPU programming in the .NET environment including tools, techniques, and strategies to bring the power of GPU to your .NET applications. We will walk through two existing .NET applications and see how they can be ported to leverage the power of GPU. One is a financial application the other a genetic search algorithm (similar to some game AIs). We will highlight the implementation and performance impacts of using GPU for each application and provide performance metrics for each.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Finance, GTC 2013 - ID S3145
Streaming:
Download:
 
Julien Demouth (NVIDIA)
The new Kepler GPU architecture introduces a new instruction: SHFL. This instruction allows threads in a warp to exchange values without using shared memory. In some cases, using the SHFL ("shuffle") instruction can significantly impro ...Read More

The new Kepler GPU architecture introduces a new instruction: SHFL. This instruction allows threads in a warp to exchange values without using shared memory. In some cases, using the SHFL ("shuffle") instruction can significantly improve the performance. In this session we will present code patterns where SHFL helps improve the performance of your applications.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Algorithms & Numerical Techniques, GTC 2013 - ID S3174
Streaming:
Download:
 
Yuan Lin (NVIDIA)
Learn how to add GPU acceleration to your domain specific language (DSL) or target GPUs in your own compiler using libNVVM, an LLVM-based solution that makes targeting GPUs from high level languages easy. This talk gives an overview of libNVVM, ...Read More

Learn how to add GPU acceleration to your domain specific language (DSL) or target GPUs in your own compiler using libNVVM, an LLVM-based solution that makes targeting GPUs from high level languages easy. This talk gives an overview of libNVVM, including new features recently added to libNVVM and the NVIDIA Compiler SDK. We will also walk through several examples showing how to translate generic LLVM IR code to target NVIDIA GPUs, including support separate compilation, linking, CUDA dynamic parallelism, and other features of the CUDA platform.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3185
Streaming:
Download:
 
Stephane Bihan (CAPS entreprise)
The announcement last year of the new OpenACC directive-based programming model supported by CAPS, CRAY and PGI compilers has open up the door to more scientific applications that can be ported on many-core systems. Following a porting methodolo ...Read More

The announcement last year of the new OpenACC directive-based programming model supported by CAPS, CRAY and PGI compilers has open up the door to more scientific applications that can be ported on many-core systems. Following a porting methodology, this talk will first review the principles of programming with OpenACC and then the advanced features available in the CAPS compilers to further optimize OpenACC applications: library integration, tuning directives with mechanisms to auto-tune and make accelerated applications adaptive to the GPU characteristics. CAPS compilers use hardware vendors'' backends such as NVIDIA CUDA and OpenCL making them the only OpenACC compilers supporting various many-core architectures.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Development Tools & Libraries, GTC 2013 - ID S3215
Streaming:
Download:
 
Scot Halverson (Los Alamos National Laboratory)
We explore solutions for utilizing CUDA based GPU Computing in Java for cross platform use. A number of methods exist for interfacing between Java and CUDA, including Java Native Interface (JNI), JCUDA, and several additional 3rd party wrapper l ...Read More

We explore solutions for utilizing CUDA based GPU Computing in Java for cross platform use. A number of methods exist for interfacing between Java and CUDA, including Java Native Interface (JNI), JCUDA, and several additional 3rd party wrapper libraries. This talk explores these options and covers complications and solutions for implementing CUDA code within BeSSy, a cross platform Java application for collector siting developed at Los Alamos National Laboratory as part of a biological surveillance program. BeSSy utilizes a number of computationally intense algorithms appropriate for GPU computing. The process of implementing BeSSy''s algorithms using CUDA and NVIDIA GPUs is explored.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Computational Physics, Scientific Visualization, GTC 2013 - ID S3218
Streaming:
Download:
 
Xavier Martorell (Technical University of Catalonia / Barcelona Supercomputing Center)
OmpSs is a data-flow programming model based on code annotations to identify independent tasks. These annotations are interpreted by the Mercurium source-to-source compiler, which emits calls to the runtime system Nanos++. Nanos++ uses the infor ...Read More

OmpSs is a data-flow programming model based on code annotations to identify independent tasks. These annotations are interpreted by the Mercurium source-to-source compiler, which emits calls to the runtime system Nanos++. Nanos++ uses the information provided by this user annotations to dynamically build a task dependency graph, which is used to schedule tasks in a data-flow way. This extended programming model directly supports tasks written in CUDA C or OpenCL C, freeing end users from writing all the boilerplate code required to explicitly schedule kernels and manage data transfers, specially on multi-accelerator and distributed systems.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Development Tools & Libraries, GTC 2013 - ID S3232
Streaming:
Download:
 
Luiz DeRose (Cray Inc.)
The current trend in the supercomputing industry is to provide hybrid systems with GPUs attached to multi-core processors. Some of the critical hurdles for the widespread adoption of accelerated computing in high performance computing are portab ...Read More

The current trend in the supercomputing industry is to provide hybrid systems with GPUs attached to multi-core processors. Some of the critical hurdles for the widespread adoption of accelerated computing in high performance computing are portability and programmability. To be an effective HPC platform, hybrid systems need a high level programming environment to facilitate the porting and development of applications to run efficiently on either GPUs or CPUs. In this talk I will present Cray''s high level parallel programming environment for hybrid systems, which consists of OpenACC compilers, and tools that can hide the complexity of the system. Ease of use is possible with compiler making it feasible for users to write applications in Fortran, C, or C++ with OpenACC directives, tools to help users port, debug, and optimize for GPUs, as well as conventional multi-core CPUs. In addition, this programming model supports experienced CUDA developers, by providing CUDA interoperability.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Development Tools & Libraries, GTC 2013 - ID S3259
Streaming:
Download:
 
Matei Ripeanu (University of British Columbia), Abdullah Gharaibeh (University of British Columbia)
Large, real-world graphs are famously difficult to process efficiently. Not only do they have a large memory footprint but most graph processing algorithms entail memory access patterns with poor locality, data-dependent parallelism, and a low c ...Read More

Large, real-world graphs are famously difficult to process efficiently. Not only do they have a large memory footprint but most graph processing algorithms entail memory access patterns with poor locality, data-dependent parallelism, and a low compute-to- memory access ratio. Additionally, most real-world graphs have a low diameter and a highly heterogeneous node degree distribution. Partitioning these graphs and simultaneously achieve access locality and load-balancing is difficult if not impossible. This session will demonstrates the advantages of graph processing on hybrid systems: that is, systems that host both traditional CPUs and discrete GPUs systems. To this end, this session will: (i) present a performance model that highlights the achievable performance on gains enabled by hybrid systems; (ii) introduce TOTEM - a processing engine based on the Bulk Synchronous Parallel (BSP) model that offers a convenient environment to implement of graph algorithms on these systems; (iii) present strategies for graph partitioning that lead to non-linear performance gains by matching the resulting workload to the processing element that best matches its characteristics; and, finally, (iv) highlight TOTEM''S efficiency by presenting performance numbers for a set of graph algorithms that present a diverse set of challenges.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Databases, Data Mining, Business Intelligence, GTC 2013 - ID S3262
Streaming:
Download:
 
Cyrus Omar (Carnegie Mellon University)
The purpose of this session is to introduce the Ace compilation environment and demonstrate its practicality for both researchers and practitioners in high-performance computing, particularly those using GPUs. Ace is an extensible, statically-ty ...Read More

The purpose of this session is to introduce the Ace compilation environment and demonstrate its practicality for both researchers and practitioners in high-performance computing, particularly those using GPUs. Ace is an extensible, statically-typed programming language embedded within Python. Python functions as a multi-purpose metalanguage where users can define new first-class language primitives, generate and transform code programmatically, and execute code directly when appropriate. Using the Ace extension mechanism, we have implemented the entirety of the OpenCL programming language, simplified it by eliminating the need for nearly all annotations and extended it with new features like higher-order functions. We will show how to set-up Ace, use it to write simple GPU programs, demonstrate a scientific simulator framework written using Ace, and describe how more advanced users can implement their own language extensions.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Development Tools & Libraries, GTC 2013 - ID S3293
Streaming:
Download:
 
Dmitry Mikushin (University of Lugano), Nikolay Likhogrud (Applied Parallel Computing LLC), Sergey Kovylov (NVIDIA)
The KernelGen project (http://kernelgen.org/) aims to develop Fortran and C compilers based on the state-of-art open-source technologies for automatic GPU kernels generation from unmodified CPU source code, significantly improving the code porti ...Read More

The KernelGen project (http://kernelgen.org/) aims to develop Fortran and C compilers based on the state-of-art open-source technologies for automatic GPU kernels generation from unmodified CPU source code, significantly improving the code porting experiences. Parallelism detection is based on LLVM/Polly and CLooG extended with mapping of loops onto GPU compute grid, and assisted with runtime alias analysis. PTX assembly code is generated with NVPTX backend. Thanks to integration with GCC frontend by means of DragonEgg plugin, and customized linker, KernelGen is able to compile complex applications into GPU-enabled binaries. The session will consist of three parts: the KernelGen programming model and its motivation, an overview of compiler design based on frontend-LLVM-NVPTX chain, and the end-user look onto performance results both for small tests (in comparison to OpenACC compilers), and for complex applications with large portion of parallelizable PDE stencil codes: WRF and COSMO models.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Climate, Weather, Ocean Modeling, GTC 2013 - ID S3298
Streaming:
Download:
 
Dhabaleswar K (DK) Panda (The Ohio State University)
Learn how MVAPICH2 library simplifies the task of porting your Message Passing Interface (MPI) applications to supercomputing clusters with NVIDIA GPUs, while exploiting performance. MVAPICH2 optimizes GPU-GPU/GPU-Host MPI communication using va ...Read More

Learn how MVAPICH2 library simplifies the task of porting your Message Passing Interface (MPI) applications to supercomputing clusters with NVIDIA GPUs, while exploiting performance. MVAPICH2 optimizes GPU-GPU/GPU-Host MPI communication using various features offered bythe CUDA toolkit, providing optimized performance on different GPU node configurations. These optimizations are integrated transparently under standard MPI API, for better programmability. In this session,we outline the various optimizations MVAPICH2 offers and how it takes advantage of the features provided by CUDA 5. We use the popular OSUmicro-benchmark suite and some example applications to demonstrate how developers can effectively take advantage of features offered by MVAPICH2. We also demonstrate use of MVAPICH2 in conjunction with OpenACC directives.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Clusters & GPU Management, GTC 2013 - ID S3316
Streaming:
Download:
 
Rob Farber (BlackDog Endeavors, LLC)
Sometimes using a GPU as a supercomputer solves only part of the computational problem. In particular, big data can turn data management and preprocessing workflows into computational problems as or more challenging than the original problem. Wo ...Read More

Sometimes using a GPU as a supercomputer solves only part of the computational problem. In particular, big data can turn data management and preprocessing workflows into computational problems as or more challenging than the original problem. Working dynamic load-link examples shows how to use CUDA/OpenCL in an efficient and scalable generic "click together" framework for HPC and commercial applications. Script in CUDA! Scalability across collaborators plus multi-system and OS portability will be demonstrated along with decade long data persistence. Google Protobufs are used in the demos to provide production hardened support for C++, Java, Python, R, and many other languages.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3443
Streaming:
Download:
 
Kelly Goss (Acceleware)
Join us for an informative introduction to GPU Programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device ...Read More

Join us for an informative introduction to GPU Programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of two simple CUDA kernels will be provided.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3452
Streaming:
Download:
 
Kelly Goss (Acceleware)
Explore the memory model of the GPU and the memory enhancements available in the new Kepler architecture and how these affect performance. The tutorial will begin with an essential overview of GPU architecture and thread cooperation before focus ...Read More

Explore the memory model of the GPU and the memory enhancements available in the new Kepler architecture and how these affect performance. The tutorial will begin with an essential overview of GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. The shuffle instruction, new shared memory configurations and Read-Only Cache of the Kepler architecture are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered.  The demonstration code will then be re-written using the shuffle instruction for the Kepler architecture.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3453
Streaming:
Download:
 
Kelly Goss (Acceleware)
Learn how to optimize your algorithms for the Fermi and Kepler architectures.  This informative tutorial will cover the key optimization strategies for compute and memory bound problems.  The session will include techniques for ensurin ...Read More

Learn how to optimize your algorithms for the Fermi and Kepler architectures.  This informative tutorial will cover the key optimization strategies for compute and memory bound problems.  The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size and using dynamic parallelism on the Kepler architecture.  For compute bound algorithms we will discuss how to improve branching efficiency, using intrinsic functions and loop unrolling.  For memory bound algorithms, optimal access patterns for global and shared memory will be presented and highlighting the differences between the Fermi and Kepler architecture.  This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3454
Streaming:
Download:
 
Kelly Goss (Acceleware)
Get the low down on debugging and profiling your GPU program!  This tutorial dives deep into profiling techniques and the tools available to help you optimize your code.  We will demonstrate NVIDIA's Visual Profiler, nvcc flags and ...Read More

Get the low down on debugging and profiling your GPU program!  This tutorial dives deep into profiling techniques and the tools available to help you optimize your code.  We will demonstrate NVIDIA's Visual Profiler, nvcc flags and cuobjdump and highlight the various methods available for understanding the performance of your CUDA program. The second part of the session will focus on debugging techniques and available tools to help you identify issues in your kernels. The latest debugging tools provided in CUDA 5.0 including NSight and cuda-memcheck will be presented.  A programming demo of the Visual profiler and Nsight will be included.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3455
Streaming:
Download:
 
Maxim Naumov (NVIDIA)
The libraries distributed in the CUDA SDK and offered by third parties provide a wealth for functions commonly encountered in a GPU acceleration project. Using these libraries can often significantly shorten the development time of a GPU project ...Read More

The libraries distributed in the CUDA SDK and offered by third parties provide a wealth for functions commonly encountered in a GPU acceleration project. Using these libraries can often significantly shorten the development time of a GPU project while leading to high-performance, high-quality software. In the CUDA 5.0 release, NVIDIA introduced enhancements across many libraries to improve performance and take advantage of new features available in the Kepler-series GPUs. In this tutorial, we will provide an overview of the libraries in the CUDA SDK, including cuBLAS, cuRAND, cuSPARSE, cuFFT, NPP and Thrust, as well as libraries provided by 3rd parties. The audience will not only learn about the strengths of the individual libraries, but also learn about the decision making process to select the best suited library for their project.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3461A
Streaming:
Download:
 
Travis Oliphant (Continuum Analytics, Inc.)
NumbaPro which is part of the Anaconda Python distribution from Continuum analytics provides support for programming the GPU from the high-level language Python. There are two APIs. The first provides a high-level functional approach wherein Num ...Read More

NumbaPro which is part of the Anaconda Python distribution from Continuum analytics provides support for programming the GPU from the high-level language Python. There are two APIs. The first provides a high-level functional approach wherein NumPy array expressions can be compiled automatically to execute in parallel on the GPU. This API can also vectorize scalar functions to operate on arrays stored on the GPU. The low-level API provides CUDA support in Python. This "CUDA-Python" dialect makes it easier to access shared-memory and synchronization primitives directly using a simplified Python syntax. Together NumbaPro provides an easier interface for unleashing the power of GPUs using Python with NumPy arrays. (Coauthored by Siu Kwan Lam).

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Finance, GTC 2013 - ID S3462
Streaming:
Download:
 
Dr. Levi Barnes (NVIDIA)
Both CUDA and OpenACC now include a number of features that facilitate multi-GPU programming and computing. In this session we will review the features useful for programming for multiple GPUs, both within a single node and across network. We wi ...Read More

Both CUDA and OpenACC now include a number of features that facilitate multi-GPU programming and computing. In this session we will review the features useful for programming for multiple GPUs, both within a single node and across network. We will cover peer-to-peer GPU communication, communication patterns for various GPU topologies, as well as streams in the context of multiple GPUs. Concepts will be illustrated with case studies.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Supercomputing, GTC 2013 - ID S3465
Streaming:
Download:
 
Paulius Micikevicius (NVIDIA)
The goal of this presentation is to describe GPU operation details underlying various performance optimization suggestions. Topics will include parallelism required to achieve high utilization of GPUs, instruction issue, warp execution and how i ...Read More

The goal of this presentation is to describe GPU operation details underlying various performance optimization suggestions. Topics will include parallelism required to achieve high utilization of GPUs, instruction issue, warp execution and how it relates to CUDA cores, various memories and how their accesses are processed, concurrent execution, and others. Emphasis will be on the Kepler architecture, but most concepts apply to previous GPU generations as well. Experimental results will be presented where appropriate.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3466
Streaming:
Download:
 
Mark Ebersole (NVIDIA), John Owens (University of California, Davis)
The rapid expansion of massively parallel computing, from smart phones to super computers, means we must improve and expand pedagogy in this field. CUDA is quickly becoming the go-to platform for teaching parallel programming at over 600 univers ...Read More

The rapid expansion of massively parallel computing, from smart phones to super computers, means we must improve and expand pedagogy in this field. CUDA is quickly becoming the go-to platform for teaching parallel programming at over 600 universities worldwide. Come join us at this session to hear from university faculty and industry professionals actively teaching CUDA across a wide spectrum of audiences. Learn what methods and materials work best for them. An "open-mic" Q&A session will follow brief presentations from each speaker, so come share your thoughts on the trends and needs of education for massively parallel computing.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3471
Streaming:
Download:
 
Mark Harris (NVIDIA)
The future of computing is parallelism, and NVIDIA''s goal for CUDA is to create an accessible and pervasive platform for diverse, high performance parallel computing. In this talk I will share our vision for the future of the CUDA platf ...Read More

The future of computing is parallelism, and NVIDIA''s goal for CUDA is to create an accessible and pervasive platform for diverse, high performance parallel computing. In this talk I will share our vision for the future of the CUDA platform and programming model, and present specific features of current and future CUDA releases that are important steps toward that future. CUDA provides a programming model that makes it easy for programmers to expose large amounts of parallelism in their applications, but I''ll talk about ways that we are making heterogeneous computing software easier to write, optimize and maintain. I''ll demonstrate how we are enabling the CUDA platform to support a broader range of programming languages and libraries. And, I will talk about technologies aimed at making CUDA applications more efficiently scale to large parallel systems.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, GTC 2013 - ID S3500
Streaming:
Download:
 
James Malcolm (AccelerEyes)
Image processing has consistently proven to benefit greatly from GPU acceleration. A number of libraries available from NVIDIA and AccelerEyes make image processing development efficient and lead to big speedups. Using these libraries can often ...Read More

Image processing has consistently proven to benefit greatly from GPU acceleration. A number of libraries available from NVIDIA and AccelerEyes make image processing development efficient and lead to big speedups. Using these libraries can often significantly shorten the development time of a GPU project while leading to high-performance, high-quality software. In conjunction with the CUDA 5.0 release, NVIDIA and AccelerEyes introduced enhancements to their libraries to improve performance and take advantage of new features available in the Kepler-series GPUs. In this tutorial, we will provide an overview of the image processing libraries in the CUDA SDK as well as those provided with ArrayFire. The audience will not only learn about the strengths of the individual libraries, but also learn about the decision making process to select the best suited library for their project.

  Back
 
Keywords:
Parallel Programming Languages & Compilers, Computer Vision, Media & Entertainment, Video & Image Processing, GTC 2013 - ID S3559
Streaming:
Download:
Ray Tracing
Presentation
Media
Christiaan Gribble (SURVICE Engineering), Lee Butler
Explore recent advances in high-performance GPU ray tracing for applications other than optical rendering. In this session, we dive into the details of Rayforce, a CUDA ray tracing engine that leverages a new graph-based spatial indexing structu ...Read More

Explore recent advances in high-performance GPU ray tracing for applications other than optical rendering. In this session, we dive into the details of Rayforce, a CUDA ray tracing engine that leverages a new graph-based spatial indexing structure to achieve performance in excess of one billion rays per second in some non-trivial scenarios. We then explore several example applications that leverage Rayforce in a framework for cognition-driven simulation (CDS) that enables analysts to experience an immersive physics-based simulation environment. Compared to traditional means of analysis, CDS represents a next-generation approach to simulation and analysis across a broad range of application domains.

  Back
 
Keywords:
Ray Tracing, Combined Simulation & Real-Time Visualization, Manufacturing Technical, GTC 2013 - ID S3157
Streaming:
Download:
 
David McAllister (NVIDIA)
OptiX is the foremost platform for GPU ray tracing. It exposes the extreme ray tracing performance of the GPU to typical developers, while hiding most of the complexity usually associated with ray tracing. This tutorial will cover everything dev ...Read More

OptiX is the foremost platform for GPU ray tracing. It exposes the extreme ray tracing performance of the GPU to typical developers, while hiding most of the complexity usually associated with ray tracing. This tutorial will cover everything developers need to get started with ray tracing in OptiX, including at least the OptiX C and C++ APIs, the execution model, acceleration structures, programmable entry points, and best practices.

  Back
 
Keywords:
Ray Tracing, Media & Entertainment, GTC 2013 - ID S3474
Streaming:
Download:
 
David McAllister (NVIDIA)
We will cover advanced topics in OptiX. Examples include implementing advanced rendering and sampling algorithms, dealing with large datasets and OptiX''s virtual memory system, new API features like callable programs, and CUDA interoper ...Read More

We will cover advanced topics in OptiX. Examples include implementing advanced rendering and sampling algorithms, dealing with large datasets and OptiX''s virtual memory system, new API features like callable programs, and CUDA interoperability. We will also cover performance analysis and optimization in OptiX, and plan to leave plenty of time for questions.

  Back
 
Keywords:
Ray Tracing, Media & Entertainment, GTC 2013 - ID S3475
Streaming:
Download:
Real-Time Graphics Applications
Presentation
Media
Jon Hjelmervik (Sintef Applied Mathematics)
Learn how to achieve real-time, pixel-accurate volume visualization of volumetric splines objects using OpenGL 4.3. See unprecedented fast ray-casting through volumetric spline models where both the geometric shape and attached simulation result ...Read More

Learn how to achieve real-time, pixel-accurate volume visualization of volumetric splines objects using OpenGL 4.3. See unprecedented fast ray-casting through volumetric spline models where both the geometric shape and attached simulation results are represented as volumetric splines. The presented rendering method consists of two passes; the first pass use tessellator and atomic counters to find all ray-surface intersection points, while a second compute pass performs the ray traversal. Simulations using volumetric spline models are called isogeometric analysis, because they allow the same representation to be shared throughout the modeling and simulation stages. We provide fast, reliable visualization, achieving the ultimate goal in isogeometric analysis.

  Back
 
Keywords:
Real-Time Graphics Applications, Scientific Visualization, GTC 2013 - ID S3134
Streaming:
Download:
 
Pascal Guehl (INRIA)
The "Gigavoxels" technology was developed to allow real-time and visually realistic exploration of large scenes and detailed objects, possibly created on the fly. This technique has wide application, video games to special effects a la ...Read More

The "Gigavoxels" technology was developed to allow real-time and visually realistic exploration of large scenes and detailed objects, possibly created on the fly. This technique has wide application, video games to special effects a la Digital Domain (avalanches, smoke, clouds), through enriched visualization of astrophysical objects (ANR project: galaxy, nebulae). Based on CUDA and C++, the API allows you to interface volume rendering with classical OpenGL rendering and mask the engine details, including the underlying data structure, the dynamic loading of data and the GPU cache system. It also gives access to the parts on which researchers experiment (voxel rendering equation, production of bricks of voxels, storage and interpolation of user variables added to voxels).

  Back
 
Keywords:
Real-Time Graphics Applications, Game Development, Media & Entertainment, Scientific Visualization, GTC 2013 - ID S3335
Streaming:
Download:
 
Phil Borchard (Dassault Systemes), Sam Itskovich (Dassault Systemes)
New product development is the lifeblood of any manufacturing organization: without innovative new products companies are left to compete mostly on price. One of the keys to stimulating creativity in the new product development process is to kee ...Read More

New product development is the lifeblood of any manufacturing organization: without innovative new products companies are left to compete mostly on price. One of the keys to stimulating creativity in the new product development process is to keep the Industrial Design community (i.e. Styling Studio) free to express new ideas quickly and easily. An integrated tools set for the creative community, including sketching, 3D shape creation, readily available realistic rendering, and quick physical prototyping all on a collaborative platform provides an excellent environment for creativity and product excellence. This session will explore how to establish and support this creative environment.

  Back
 
Keywords:
Real-Time Graphics Applications, Ray Tracing, GTC 2013 - ID S3557
Streaming:
Download:
Remote Graphics & Cloud-Based Graphics
Presentation
Media
Matt Hughes (Calgary Scientific Inc.)
Learn how GPUs and image streaming technology have been combined to provide FDA cleared medical imaging to anyone with an iOS, Android, or Adobe Flash enabled device. This talk will discuss how ResolutionMD and PureWeb enable secure access to me ...Read More

Learn how GPUs and image streaming technology have been combined to provide FDA cleared medical imaging to anyone with an iOS, Android, or Adobe Flash enabled device. This talk will discuss how ResolutionMD and PureWeb enable secure access to medical information and advanced visualization services with applications that are perfectly suited to the target device. The architecture that enables scalable image computation and delivery of the rendered image to the client will be outlined, and the scalability challenges that have been encountered over the development of ResolutionMD and their solutions will be discussed.

  Back
 
Keywords:
Remote Graphics & Cloud-Based Graphics, Medical Imaging & Visualization, GTC 2013 - ID S3322
Streaming:
Download:
 
Jason K. Lee (NVIDIA), Milan Diebel (NVIDIA), Steve Harpster (NVIDIA)
In this hands on tutorial session attendees will learn how to install, configure and use GRID based servers in both VMware ESX and Citrix XenServer environments. ...Read More

In this hands on tutorial session attendees will learn how to install, configure and use GRID based servers in both VMware ESX and Citrix XenServer environments.

  Back
 
Keywords:
Remote Graphics & Cloud-Based Graphics, Desktop & Application Virtualization, GTC 2013 - ID S3355
Streaming:
Download:
 
Derek Thorslund (Citrix)
Recent technological advances have made it practical to deliver 3D professional graphics applications from the Cloud (private or public) with a high quality user experience and at an attractive cost. Organizations can keep their intellectual pro ...Read More

Recent technological advances have made it practical to deliver 3D professional graphics applications from the Cloud (private or public) with a high quality user experience and at an attractive cost. Organizations can keep their intellectual property safe in the data center since only fully-rendered screen images are sent over the network. Users in remote locations no longer have to wait for large file transfers. And they can access 3D models from a wide variety of devices, including iPads, Android tablets and thin clients. Learn how Citrix XenDesktop, XenServer and Receiver technologies leverage NVIDIA Quadro and VGX⢠to make all of this a reality for many organizations today.

  Back
 
Keywords:
Remote Graphics & Cloud-Based Graphics, Cloud Visualization, GTC 2013 - ID S3446
Streaming:
Download:
 
Thomas Poppelgaard (Commaxx)
Discover how 3D applications can be centralized and how NVIDIA with Citrix deliver a remote workplace, from which companies can benefit from. Audience members will learn about building out these cloud infrastructure''s, and the benefits ...Read More

Discover how 3D applications can be centralized and how NVIDIA with Citrix deliver a remote workplace, from which companies can benefit from. Audience members will learn about building out these cloud infrastructure''s, and the benefits that the customers gained from choosing the solution to match the need. Hear real world examples, you will gain key insights for determining if these types of solutions are right for your company.

  Back
 
Keywords:
Remote Graphics & Cloud-Based Graphics, Desktop & Application Virtualization, GTC 2013 - ID S3540
Streaming:
Download:
 
Franck Diard (NVIDIA)
Franck Diard, Chief Software Architect at NVIDIA, will talk about the technologies behind GRID and how you can integrate them into your cloud products. The audience will learn about the key components, which allow optimal capture, efficient comp ...Read More

Franck Diard, Chief Software Architect at NVIDIA, will talk about the technologies behind GRID and how you can integrate them into your cloud products. The audience will learn about the key components, which allow optimal capture, efficient compression, fast streaming and low latency display, of high performance games from the cloud. Franck will demonstrate how these components fit together and how to optimize their usage to deliver an ultimate cloud gaming experience.

  Back
 
Keywords:
Remote Graphics & Cloud-Based Graphics, Media & Entertainment, GTC 2013 - ID S3543
Streaming:
Download:
 
Pat Lee (VMWare)
VMware Horizon View combined with NVIDIA GRID and Quadro GPUs can now deliver the most flexible and scalable virtual 3D graphics architecture to power users, designers and demanding 3D workstation class applications to support the modern flexibl ...Read More

VMware Horizon View combined with NVIDIA GRID and Quadro GPUs can now deliver the most flexible and scalable virtual 3D graphics architecture to power users, designers and demanding 3D workstation class applications to support the modern flexible work force. More and more organizations are looking to enable an ever more remote workforce but with the security and data compliance. Until recently, remote workforce was limited to knowledge workers and office applications, but with the latest advances in virtualized graphics and virtual desktop integration from NVIDIA and VMware, organizations can enable workers in local high speed offices or in home offices access to high performance 3D applications with all data being stored securely in your private cloud.

  Back
 
Keywords:
Remote Graphics & Cloud-Based Graphics, GTC 2013 - ID S3544
Streaming:
Download:
 
Peter Relan (Agawi)
This talk will explore the implications of NVIDIA''s GPU and Grid Server technology on Cloud Infrastructure and Services, especially as it pertains to gaming. Gaming is the last form of entertainment that is not delivered from the cloud: ...Read More

This talk will explore the implications of NVIDIA''s GPU and Grid Server technology on Cloud Infrastructure and Services, especially as it pertains to gaming. Gaming is the last form of entertainment that is not delivered from the cloud: music, movies and TV are all streamed to tablets and mobile devices globally. With NVIDIA''s new Grid Sever, a fundamental change in the way cloud-based gaming and other applications will be enabled. Peter Relan is one of the most influential voices in the gaming industry today as the founder of YouWeb Incubator as well as the Executive Chairman of Agawi.

  Back
 
Keywords:
Remote Graphics & Cloud-Based Graphics, Cloud Visualization, Game Development, GTC 2013 - ID S3581
Streaming:
Download:
 
Ankit Patel (NVIDIA)
Learn more about NVIDIA''s new product to accelerate media workflows by delivering high-end GPU performance to any Windows, Linux or Mac device. ...Read More

Learn more about NVIDIA''s new product to accelerate media workflows by delivering high-end GPU performance to any Windows, Linux or Mac device.

  Back
 
Keywords:
Remote Graphics & Cloud-Based Graphics, Cloud Visualization, GTC 2013 - ID S3582
Streaming:
Download:
 
Ron Haberman (CiiNOW, Inc)
Ron will lead a visionary discussion encompassing the critical, and often overlooked, elements to delivering a pure-value Cloud Gaming offering. Implications of hardware vs software encoding, maximizing concurrency, leveraging social media, and ...Read More

Ron will lead a visionary discussion encompassing the critical, and often overlooked, elements to delivering a pure-value Cloud Gaming offering. Implications of hardware vs software encoding, maximizing concurrency, leveraging social media, and the impact of latency will all be explored.

  Back
 
Keywords:
Remote Graphics & Cloud-Based Graphics, Cloud Visualization, Media & Entertainment, GTC 2013 - ID S3585
Streaming:
Download:
Rendering & Animation
Presentation
Media
Phillip Miller (NVIDIA), Peter de Lappe (NVIDIA)
Explore how global illumination can be used in commercial applications (like Autodesk 3ds Max) to quickly achieve results that rival photographs. This course will explain the basics of the NVIDIA Iray renderer and how to use it efficiently for a ...Read More

Explore how global illumination can be used in commercial applications (like Autodesk 3ds Max) to quickly achieve results that rival photographs. This course will explain the basics of the NVIDIA Iray renderer and how to use it efficiently for a variety of lighting conditions. Newly developed features (not yet within shipping products) will also be demonstrated that may be of interest to developers looking to integrate Iray in their applications. Configuring systems for optimal GPU performance and scene capacity will also be discussed.

  Back
 
Keywords:
Rendering & Animation, Media & Entertainment, Ray Tracing, GTC 2013 - ID S3473
Streaming:
Download:
 
Lutz Kettner (NVIDIA), Matthias Raab (NVIDIA), Jan Jordan (NVIDIA)
Learn how NVIDIA''s Material Definition Language (MDL) provides a consistent means to define material appearances for use in physically-based rendering with a highly flexible, layered approach. This presentation introduces the key ideas ...Read More

Learn how NVIDIA''s Material Definition Language (MDL) provides a consistent means to define material appearances for use in physically-based rendering with a highly flexible, layered approach. This presentation introduces the key ideas of MDL and how they address the challenge of matching looks between different renderers with a common definition â  be they path tracers, basic ray tracers, or conventional rasterizers. Developers will learn how to transition from conventional shading languages to MDL while Material Authors and Artists will see through high quality examples how they can easily create and assign custom looks. The Iray 2013 rendering platform will be used to interactively demonstrate the use of MDL between multiple rendering modes and rendering techniques. The future of MDL, including support for Measured Materials from measurement devices will also be discussed.

  Back
 
Keywords:
Rendering & Animation, Ray Tracing, GTC 2013 - ID S3560
Streaming:
Download:
Robotics & AI
Presentation
Media
Kamil Rocki (University of Tokyo)
In this session we will be presenting a high performance GPU accelerated implementation of the Iterated Local Search algorithm using 2-opt local search to solve the Traveling Salesman Problem (TSP). GPU usage greatly decreases the time needed to ...Read More

In this session we will be presenting a high performance GPU accelerated implementation of the Iterated Local Search algorithm using 2-opt local search to solve the Traveling Salesman Problem (TSP). GPU usage greatly decreases the time needed to optimize the route, however requires a well tuned implementation. Results will show that at least 90% of the time spent during Iterated Local Search is on the local search itself, therefore GPU is used to accelerate this part of the algorithm. The main contribution of this work is the problem division scheme which allows to solve arbitrarily big problem instances using GPU. The tested algorithm using different TSPLIB instances on a GeForce GTX 680 GPU achieved very high performance of over 700 GFLOPS during calculation of the distances. Compared to the CPU implementation, GPU is able to perform local optimization approximately 150 times faster allowing to solve very large problem instances on a single machine.

  Back
 
Keywords:
Robotics & AI, Algorithms & Numerical Techniques, GTC 2013 - ID S3222
Streaming:
Download:
 
Lu Zheng (Carnegie Mellon University), Ole Mengshoel (Carnegie Mellon University)
In this session you will learn about probabilistic graphical models, specifically Bayesian network and junction trees, which are popular in artificial intelligence, machine learning, and statistics. Compiling Bayesian networks (BNs) to junction ...Read More

In this session you will learn about probabilistic graphical models, specifically Bayesian network and junction trees, which are popular in artificial intelligence, machine learning, and statistics. Compiling Bayesian networks (BNs) to junction trees and performing belief propagation over them is among the most prominent approaches to computing posterior distributions in BNs. However, belief propagation over junction tree can be computationally intensive. In this talk we discuss GPU-friendly data structures and algorithms that extend and speed up existing junction tree techniques, in particular algorithms that compute each belief propagation message in parallel. We have implemented this approach on an NVIDIA GPU and tested it using BNs from several applications. Experimentally, we studied how junction tree parameters affect parallelization opportunities and hence the performance of our algorithms. So far, speedups over 10x have been observed for BNs from applications.

  Back
 
Keywords:
Robotics & AI, Databases, Data Mining, Business Intelligence, GTC 2013 - ID S3277
Streaming:
Download:
 
Frank Sehnke (Zentrum fur Sonnenenergie- und Wasserstoff-Forschung Baden-Wurttemberg)
The key features of the Machine Learning (ML) suite Learn-O-Matic are that it provides a completely automated framework for supervised learning with an easy-to-use web frontend. Computing on GPUs allows us to automatically explore the space of m ...Read More

The key features of the Machine Learning (ML) suite Learn-O-Matic are that it provides a completely automated framework for supervised learning with an easy-to-use web frontend. Computing on GPUs allows us to automatically explore the space of meta parameters (e.g. network architecture and regularization term) by state-of-the-art Reinforcement Learning. An automated feature selection has been implemented to decide which inputs are relevant for the learning task at hand. A typical user can therefore apply Learn-O-Matic to his or her problems without any expert knowledge of ML. Learn-O-Matic supports different Neural Network architectures, Gaussian Processes and Support Vector Regression. We present two real-world example applications developed with Learn-O-Matic. The first one is on atmospheric ozone profile retrieval from satellite data, the second one is a wind power forecast system that predicts wind-power production.

  Back
 
Keywords:
Robotics & AI, Algorithms & Numerical Techniques, GTC 2013 - ID S3356
Streaming:
Download:
Scientific Visualization
Presentation
Media
David Beilloin (Visualization Sciences Group)
The goal of this talk is simply to render very large volume data sets, with very high image quality and highly interactive performance using GPU accelerated personal machines; without "throwing hardware at the problem". Volume size ima ...Read More

The goal of this talk is simply to render very large volume data sets, with very high image quality and highly interactive performance using GPU accelerated personal machines; without "throwing hardware at the problem". Volume size image quality and performance are in conflict with each other and the goal is not yet completely attainable but we have a new idea. The GPU ray-casting algorithm can produce very high image quality, but is computationally expensive. After implementing the usual optimizations for ray length and step size, the only option is to reduce the number of rays. We present a new algorithm that iteratively refines the image by increasing the number of rays in areas of high detail. This allows the rendering process to stop based on either desired quality or time allowed.

  Back
 
Keywords:
Scientific Visualization, Graphics Performance Optimization, GTC 2013 - ID S3062
Streaming:
Download:
 
Matthaus Chajdas (Technische Universitat Munchen)
In this session, we showcase two sample-based approaches for rendering extremely complex surfaces or iso-surfaces in high-resolution volume data. In each case, we convert the data into a compact representation on the GPU to reduce the required m ...Read More

In this session, we showcase two sample-based approaches for rendering extremely complex surfaces or iso-surfaces in high-resolution volume data. In each case, we convert the data into a compact representation on the GPU to reduce the required memory and allow for efficient rendering. First, we present a novel hybrid pipeline capable of rendering extremely large polygonal meshes and complex iso-surfaces using both the polygon rasterizer and a surface raycaster. It uses a new, very compact sample-based surface representation while still enabling efficient ray-casting. We also show a different take on iso-surface rendering, this time purely based on rasterization. The second algorithm allows for direct level-of-detail generation on the surface representation as well as cheap anti-aliased rendering. Finally, we take a quick look at how adaptive trees can be used to speed up classic ray-casting algorithms compared to standard octree or k-D-Tree subdivision.

  Back
 
Keywords:
Scientific Visualization, Real-Time Graphics Applications, GTC 2013 - ID S3205
Streaming:
Download:
 
Christopher Sewell (Los Alamos National Laboratory)
The development of parallel visualization and analysis operators frequently requires re-writing the underlying algorithms for many different platforms. In order to facilitate portability, we have devised a framework for creating such operators t ...Read More

The development of parallel visualization and analysis operators frequently requires re-writing the underlying algorithms for many different platforms. In order to facilitate portability, we have devised a framework for creating such operators that employs the data-parallel programming model. By writing the operators using only data-parallel primitives, the same code may be compiled to multiple targets using architecture-specific backend implementations of these primitives. Specifically, we make use of and extend NVIDIA''s Thrust library, which provides CUDA, OpenMP, and TBB backends. We have achieved good parallel performance for several operators on multi-core CPUs and on NVIDIA GPUs using the exact same operator code. Recent developments include new operators, in-situ integration with simulation codes, and work on operators that can work on unstructured grids and in distributed memory environments, as well as integration with VTK and ParaView.

  Back
 
Keywords:
Scientific Visualization, Algorithms & Numerical Techniques, Development Tools & Libraries, GTC 2013 - ID S3393
Streaming:
Download:
 
Robert Maynard (Kitware Inc.)
Explore new techniques in scientific data analysis and visualization algorithms by looking at Dax toolkit which provides a development framework for the next generation of high-performance computers and GPU''s. Dax provides a concept mec ...Read More

Explore new techniques in scientific data analysis and visualization algorithms by looking at Dax toolkit which provides a development framework for the next generation of high-performance computers and GPU''s. Dax provides a concept mechanism to automatically build parallel scheduling code from signatures using C++.

  Back
 
Keywords:
Scientific Visualization, Supercomputing, GTC 2013 - ID S3400
Streaming:
Download:
Signal Processing
Presentation
Media
Robert Dunn (University of Auckland)
Find out how CUDA enables video-rate blind source separation of hyperspectral video streams. Each pixel in a hyperspectral video frame contains hundreds of contiguous bands representing a section of the electromagnetic spectrum at that location. ...Read More

Find out how CUDA enables video-rate blind source separation of hyperspectral video streams. Each pixel in a hyperspectral video frame contains hundreds of contiguous bands representing a section of the electromagnetic spectrum at that location. This allows for material identification and classification over a wide area and is currently used in applications such as geological mapping and target acquisition. Current techniques use linear algebra and are highly parallelisable but most existing work focuses on single images. Sequential video frames contain a significant amount of redundancy and our research exploits this to provide substantial complexity reductions to spectral unmixing algorithms. This session will outline and demonstrate some of the problems and solutions we encountered in developing a real-time hyperspectral video processing system using CUDA.

  Back
 
Keywords:
Signal Processing, Video & Image Processing, GTC 2013 - ID S3030
Streaming:
Download:
 
Eri Rubin (SagivTech LTD)
This talk will describe the UNLocBox, a GPU based library containing advanced approaches in the solution of inverse problems. This work is a joint effort of SagivTech Ltd, Ecole Polytechnique Federale de Lausanne and the University of Bremen wit ...Read More

This talk will describe the UNLocBox, a GPU based library containing advanced approaches in the solution of inverse problems. This work is a joint effort of SagivTech Ltd, Ecole Polytechnique Federale de Lausanne and the University of Bremen within a Future Emerging Technologies project sponsored by the European Union. The project, UNLocx, aims at developing a framework for constructing problem adapted, ultra-efficient algorithms concerning (de-)coding and analyzing and synthesizing of signals and images. Although these newly developed algorithms produce promising results, their computational complexity prevents their application to real life problems. In this project, SagivTech''s main task was to provide a GPU library for these algorithms, that can be easily called by Matlab and C developers who have no knowledge of GPU computing. The GPU UNLocBox library allows researchers to tackle complex applications in medical image processing which could not be solved otherwise. This research has been (partially) supported by EU FET Open grant UNLocX (255931).

  Back
 
Keywords:
Signal Processing, Algorithms & Numerical Techniques, Development Tools & Libraries, Medical Imaging & Visualization, GTC 2013 - ID S3087
Streaming:
Download:
 
Nikolaus Rath (Tri Alpha Energy Inc.)
Discover a new application regime for GPUs: fast, real-time signal processing in control systems. The advantages of using GPUs for complex calculations with huge amounts of data are well established. This session will explore the opposite regime ...Read More

Discover a new application regime for GPUs: fast, real-time signal processing in control systems. The advantages of using GPUs for complex calculations with huge amounts of data are well established. This session will explore the opposite regime where latency is much more important than throughput, and kilobytes of data have to be processed in microseconds. We demonstrate how to completely transfer program control flow from CPU to GPU, so that processing times are fully deterministic and not affected by CPU scheduling. You will also learn how to use peer-to-peer DMA to transfer input and output signals directly into and out of GPU memory to minimize latency. The discussed techniques will be illustrated with examples from the control system of the HBT-EP tokamak, a research device for magnetic confinement of nuclear fusion plasmas which uses a GPU to compute a non-linear, adaptive, state-space control algorithm with a cycle time of four microseconds.

  Back
 
Keywords:
Signal Processing, Computational Physics, GTC 2013 - ID S3094
Streaming:
Download:
 
Jesus Ortiz (Advanced Robotics Department, Istituto Italiano di Tecnologia, Italy), Francesco Baralli (NATO STO Centre for Maritime Research and Exploration, Italy)
In this session we''ll speak about the implementation of a SAS (Synthetic Aperture Sonar) processing software on the GPU, running in real-time on-board of an Autonomous Underwater Vehicle (AUV). Current AUVs run in pre-planned survey rou ...Read More

In this session we''ll speak about the implementation of a SAS (Synthetic Aperture Sonar) processing software on the GPU, running in real-time on-board of an Autonomous Underwater Vehicle (AUV). Current AUVs run in pre-planned survey routes and record all the data for off-line processing. They don''t have flexibility to adapt to environmental conditions and sonar performance. With this new software we can increase the level of autonomy, allowing adaptive behaviors. We''ll show the process of design and implementation of the software, as well as the first results of the tests carried out with a real AUV.

  Back
 
Keywords:
Signal Processing, Computational Chemistry, GTC 2013 - ID S3133
Streaming:
Download:
 
Wesley Faler (Part-Time Scientists)
Whether as part of Part-Time Scientists'' robotic mission to the moon or a personal satellite for testing a new ion engine, sensor data easily exceeds communication bandwidth. The new field of Compressed Sensing offers techniques for cre ...Read More

Whether as part of Part-Time Scientists'' robotic mission to the moon or a personal satellite for testing a new ion engine, sensor data easily exceeds communication bandwidth. The new field of Compressed Sensing offers techniques for creating high fidelity reconstructions of sensor data sent over low bandwidth channels. However, reconstructing signals with Compressed Sensing is computationally intensive and only practical in real time using GPUs. Explore the use of Compressed Sensing to get more information from sensor data on these real space projects. We''ll look at available Compressed Sensing libraries and their tradeoffs when used with real sensors, looking at the entire data transfer pipeline for optimization.

  Back
 
Keywords:
Signal Processing, Robotics & AI, GTC 2013 - ID S3221
Streaming:
Download:
 
Dustin Franklin (GE Intelligent Platforms)
GPUDirect support for RDMA provides low-latency interconnectivity between NVIDIA GPUs and various networking, storage, and FPGA devices. Discussion will include how the CUDA 5 technology increases GPU autonomy and promotes multi-GPU topologies w ...Read More

GPUDirect support for RDMA provides low-latency interconnectivity between NVIDIA GPUs and various networking, storage, and FPGA devices. Discussion will include how the CUDA 5 technology increases GPU autonomy and promotes multi-GPU topologies with high GPU-to-CPU ratios. In addition to improved bandwidth and latency, the resulting increase in GFLOPS/watt poses a significant impact to both HPC and embedded applications. We will dig into scalable PCIe switch hierarchies, as well as software infrastructure to manage device interopability and GPUDirect streaming. Highlighting emerging architectures composed of Tegra-style SoCs that further decouple GPUs from discrete CPUs to achieve greater computational density.

  Back
 
Keywords:
Signal Processing, Clusters & GPU Management, Development Tools & Libraries, GTC 2013 - ID S3266
Streaming:
Download:
 
Thomas Benson (Georgia Tech Research Institute)
This presentation describes optimization approaches for a synthetic aperture radar image formation algorithm on Kepler K20 GPUs using CUDA. In addition to utilizing newly introduced features, include an analysis of the success of optimizations r ...Read More

This presentation describes optimization approaches for a synthetic aperture radar image formation algorithm on Kepler K20 GPUs using CUDA. In addition to utilizing newly introduced features, include an analysis of the success of optimizations ranging from numerical approximations to memory hierarchy optimizations to lower-level code optimizations. Previously reported optimization results for this application on Fermi-generation GPUs and we will compare the optimization methodologies for Fermi and Kepler as well as the relative success of such strategies on the two architectures. We achieve better than twice the performance on the K20 GPU, with comparable power consumption, relative to Fermi-generation Teslas for similarly optimized kernels.

  Back
 
Keywords:
Signal Processing, Algorithms & Numerical Techniques, GTC 2013 - ID S3274
Streaming:
Download:
 
Colin Shea (Ventura Solutions, Inc.)
One of the drawbacks for GPU based processing has been it''s reliance on system memory for ingress and egress of data from other PCIe devices. GPUDirect''s initial releases allowed one step GPU-to-GPU DMA and a simplified two ste ...Read More

One of the drawbacks for GPU based processing has been it''s reliance on system memory for ingress and egress of data from other PCIe devices. GPUDirect''s initial releases allowed one step GPU-to-GPU DMA and a simplified two step InfiniBand-to-GPU memory copy access. With the release of the NVIDIA Tesla K20 GPU and 3rd party DMA capabilities in CUDA 5, the ingress/egress performance penalty is eliminated. Utilizing 10GbE network interface controllers (NICs) we detail the methodology in which generalized DMA tractions can be achieved enabling lower latency transfers between NVIDIA''s Kepler GPU and 10GbE NICs.

  Back
 
Keywords:
Signal Processing, Development Tools & Libraries, GTC 2013 - ID S3300
Streaming:
Download:
 
Michael Wu (Rice University), Guohui Wang (Rice University)
Learn about how to efficiently implement key components of high performance wireless communication systems using CUDA on NVIDIA GPU. These key components of wireless communication systems, multiple-input multiple-output (MIMO) detector and low-d ...Read More

Learn about how to efficiently implement key components of high performance wireless communication systems using CUDA on NVIDIA GPU. These key components of wireless communication systems, multiple-input multiple-output (MIMO) detector and low-density parity-check code (LDPC) decoder are computationally intensive and are typically implemented in hardware. Presented are detector and decoder algorithms optimized specifically for GPU, which employs hundreds of cores to process the workload in parallel to achieve high throughput. These flexible GPU-based designs can be used to accelerate simulation and software-defined radio (SDR) test-beds significantly.

  Back
 
Keywords:
Signal Processing, GTC 2013 - ID S3329
Streaming:
Download:
Supercomputing
Presentation
Media
Wenji Wu (Fermilab), Phil DeMar (Fermilab)
Network traffic monitoring & analysis is the process of capturing network traffic and inspecting it closely to determine what is happening on the network. It is an indispensable technique to assist in network operations & management, and ...Read More

Network traffic monitoring & analysis is the process of capturing network traffic and inspecting it closely to determine what is happening on the network. It is an indispensable technique to assist in network operations & management, and performance troubleshooting. Within high-speed networks, network traffic monitoring and analysis applications may require enormous raw compute power and high I/O throughputs, especially when traffic scrutiny on a per-packet basis is needed. Under those conditions, the applications face tremendous performance and scalability challenges. Recently, GPUs have been widely applied to accelerate general purpose scientific and engineering computing. At Fermilab, we have prototyped a GPU-assisted network traffic monitoring & analysis system, which analyzes network traffic on a per-packet basis. Experiments show that our system can achieve extremely high performance with network traffic monitoring & analysis. In this talk, we will describe our architectural approach in developing a generic GPU-assisted network traffic monitoring and analysis capability.

  Back
 
Keywords:
Supercomputing, Algorithms & Numerical Techniques, GTC 2013 - ID S3146
Streaming:
Download:
 
Antonino Tumeo (Pacific Northwest National Laboratory), Oreste Villa (Pacific Northwest National Laboratory)
Learn the techniques that Pacific Northwest National Laboratory (PNNL) computer scientists are applying to enhance the performance of scientific applications such as NWChem (Quantum Chemistry), STOMP (subsurface flow transport) and Paraflow (mul ...Read More

Learn the techniques that Pacific Northwest National Laboratory (PNNL) computer scientists are applying to enhance the performance of scientific applications such as NWChem (Quantum Chemistry), STOMP (subsurface flow transport) and Paraflow (multiflow simulation) on large scale GPU-accelerated clusters (e.g. ORNL Titan). This talk will discuss approaches such as Domain Specific Languages and auto-tuners for tensor contractions, library based approaches, dynamic heterogeneous task-based runtimes, compiler and run-time transformations for GPU code, which we are currently exploring to allow scaling these scientific applications to tens of thousands of GPU-accelerated nodes. Will provide initial results on the various approaches, comparing the performances obtained with code restructuring to pragma based (e.g., OpenACC) and to library based approaches, which maintain most of the legacy code intact while still providing considerable speedups.

  Back
 
Keywords:
Supercomputing, Computational Chemistry, GTC 2013 - ID S3289
Streaming:
Download:
 
Sarah Tariq (NVIDIA)
Over the last decades video games have evolved from simple 2D sprite based animations to nearly realistic cinematic experiences. The hardware powering these games, the Graphics Processing Unit (GPU), has evolved over the last 15 years from simpl ...Read More

Over the last decades video games have evolved from simple 2D sprite based animations to nearly realistic cinematic experiences. The hardware powering these games, the Graphics Processing Unit (GPU), has evolved over the last 15 years from simple fixed function triangle rasterization and texturing hardware to highly programmable and massively parallel general purpose processors with high memory bandwidth and high performance per watt. These characteristics also make GPUs ideally suited for typical supercomputing tasks. In this talk we''ll discuss how the two fields have evolved together and influenced each other, and how we have come to the point where the same hardware used for rendering the latest 3D video games is being used to try to solve the world''s most challenging problems, from human health to climate change.

  Back
 
Keywords:
Supercomputing, Game Development, GTC 2013 - ID S3424
Streaming:
Download:
 
Michael Wolfe (The Portland Group)
In three distinct parts, this talk presents the new features that have been accepted into OpenACC version 2.0 by the time of the conference. Among the important proposed features are support for separate compilation and procedure calls and neste ...Read More

In three distinct parts, this talk presents the new features that have been accepted into OpenACC version 2.0 by the time of the conference. Among the important proposed features are support for separate compilation and procedure calls and nested parallelism. For each feature, there is a motivating example and a discussion of usage guidelines. Second, the implementation of these features in the PGI Accelerator compilers will be presented. Finally, new features in the PGI compilers that are not yet part of the OpenACC specification, including support for multiple devices and multiple device types will be discussed.

  Back
 
Keywords:
Supercomputing, Development Tools & Libraries, GTC 2013 - ID S3447
Streaming:
Download:
 
Brent Leback (The Portland Group)
In this talk, recent CUDA Fortran implementations of CUDA 5.0 features including support for the Kepler architecture will be reviewed. Topics covered include dynamic parallelism where we will show extensions to the CUDA Runtime API from device c ...Read More

In this talk, recent CUDA Fortran implementations of CUDA 5.0 features including support for the Kepler architecture will be reviewed. Topics covered include dynamic parallelism where we will show extensions to the CUDA Runtime API from device code, and examples of kernel launches from device code. Also shown will be examples of using streams and events from within device code, including guidance on error checking of the new features from within device code. New options to the compiler drivers in PGI 2013, and examples of compiling and linking CUDA static libraries will be covered. Finally, the talk will wrap up by discussing and showing opportunities for optimization on the new Kepler architectures..

  Back
 
Keywords:
Supercomputing, Parallel Programming Languages & Compilers, GTC 2013 - ID S3448
Streaming:
Download:
 
Fernanda Foertter (Oak Ridge National Lab), Jack Wells (Oak Ridge National Lab)
This presentation will focus on early outcomes from Titan, the world''s fastest supercomputer. We will showcase results from the Center for Accelerated Application Readiness, or CAAR, where Titan''s manufacturer Cray, NVIDIA, and ...Read More

This presentation will focus on early outcomes from Titan, the world''s fastest supercomputer. We will showcase results from the Center for Accelerated Application Readiness, or CAAR, where Titan''s manufacturer Cray, NVIDIA, and scientific computing experts at OLCF have collaborated to make several applications ready to use Titan''s GPU accelerators. This talk will also explore some best practices the CAAR team learned in the process of porting CPU-only applications to Titan''s GPU-accelerated architecture. Preliminary Early Science results from users running on Titan will be discussed, including, for example, applications in combustion for advanced engines, properties of magnetic materials for clean energy applications, and reactor modeling for today''s fleet of light-water reactors. Lastly, details about Titan system setup, OLCF resources, and how to apply for time on Titan''s 18,688 GPU accelerated nodes will be shared.

  Back
 
Keywords:
Supercomputing, GTC 2013 - ID S3470
Streaming:
Download:
 
Pak Lui (Mellanox Technologies)
GPU-based clusters are being adopted at a rapid pace in high performance computing clusters to perform compute-intensive tasks at large scale. One of the main performance challenges in the deployments of this GPU-based clusters is the performanc ...Read More

GPU-based clusters are being adopted at a rapid pace in high performance computing clusters to perform compute-intensive tasks at large scale. One of the main performance challenges in the deployments of this GPU-based clusters is the performance and latency of communications between GPUs across the interconnect fabric. The goal of this session is to highlight interconnect optimizations such as RDMA for GPUDirect, which provides for higher performance and better utilization for GPU communications by allowing the network adapter and the GPU to directly communicate in a peer-to-peer fashion, completely bypassing the CPU subsystem. We will show the benefits of using this new technology and explain how registrants can utilize this new features in their own compute clusters.

  Back
 
Keywords:
Supercomputing, GTC 2013 - ID S3504
Streaming:
Download:
Video & Image Processing
Presentation
Media
Thomas True (NVIDIA)
A high performance parallel floating point processor with very high memory bandwidth, the GPU is ideal for video and image processing applications. This tutorial will describe best practices for the efficient transfer of video and image data to ...Read More

A high performance parallel floating point processor with very high memory bandwidth, the GPU is ideal for video and image processing applications. This tutorial will describe best practices for the efficient transfer of video and image data to and from the GPU as well as techniques for the optimal use of GPU resources for video and image data processing. Simple CUDA-based examples will be utilized to demonstrate concepts presented.

  Back
 
Keywords:
Video & Image Processing, High Resolution High Frame Rate Video & Cinema, Media & Entertainment, Real-Time Broadcast Graphics, GTC 2013 - ID S3039
Streaming:
Download:
 
Timo Stich (NVIDIA)
We see increasing demand for easy to use, fast, high-resolution image and video manipulation tools. Recently, Criminisi et al. proposed the geodesic distance transform (GDT) which can be used to implement several interesting image and video edit ...Read More

We see increasing demand for easy to use, fast, high-resolution image and video manipulation tools. Recently, Criminisi et al. proposed the geodesic distance transform (GDT) which can be used to implement several interesting image and video editing tasks efficiently for high resolution imagery. In this work we present an efficient CUDA GDT implementation. The key contribution is the introduction of a score-boarding mechanism for CUDA blocks. This significantly improves the achieved overlap of memory transfers and computation and reduces kernel launch overheads.

  Back
 
Keywords:
Video & Image Processing, Computer Vision, Media & Entertainment, GTC 2013 - ID S3090
Streaming:
Download:
 
Johanna Beyer (King Abdullah University of Science and Technology), Ronell Sicat (King Abdullah University of Science and Technology), Markus Hadwiger (King Abdullah University of Science and Technology)
Learn how to interactively evaluate accurate non-linear image operators on gigapixel images for any desired output resolution. We describe a novel GPU-friendly multi-resolution data structure called sparse pdf maps (sPDF-maps). This data structu ...Read More

Learn how to interactively evaluate accurate non-linear image operators on gigapixel images for any desired output resolution. We describe a novel GPU-friendly multi-resolution data structure called sparse pdf maps (sPDF-maps). This data structure encodes continuous probability density functions (pdfs) of pixel neighborhoods in the original image, enabling the accurate computation of non-linear operators such as anti-aliased color mapping, fast local Laplacian filters, smoothed local histogram filters and bilateral filters. In a pre-processing step, we compute the sPDF-map of the input image using Matching Pursuit implemented in CUDA. sPDF-maps are optimized for parallel image reconstruction on the GPU. We will discuss image reconstruction in a tile-based gigapixel viewer using CUDA, which allows non-linear operators to be evaluated at interactive rates.

  Back
 
Keywords:
Video & Image Processing, GTC 2013 - ID S3142
Streaming:
Download:
 
Eric Kelmelis (EM Photonics)
Learn how GPUs can be applied to real-time, real-world image processing applications. Images and videos recorded at long distances (greater than 1 mile) often suffer degradation due to the atmospheric turbulence between the subject and camera. T ...Read More

Learn how GPUs can be applied to real-time, real-world image processing applications. Images and videos recorded at long distances (greater than 1 mile) often suffer degradation due to the atmospheric turbulence between the subject and camera. This effect severely limits the quality of data that is captured by high-end imaging systems. Building off technology originally developed by Lawrence Livermore National Laboratory, EM Photonics has developed a GPU-accelerated, real-time image enhancement tool, ATCOM, the can be coupled with cameras to remove the distortion present in long range imagery. In this talk we will provide an overview of the algorithm used, discuss how GPUs were applied to move it from a post-processing to a real-time deployment, and present results of our tool working with NASA and Army camera systems.

  Back
 
Keywords:
Video & Image Processing, Media & Entertainment, GTC 2013 - ID S3178
Streaming:
Download:
 
Gernot Ziegler (NVIDIA)
Locating connected regions in images and volumes is a substantial building block in image and volume processing pipelines. We demonstrate how the Connected Components problem strongly benefits from a new feature in the Kepler architecture, direc ...Read More

Locating connected regions in images and volumes is a substantial building block in image and volume processing pipelines. We demonstrate how the Connected Components problem strongly benefits from a new feature in the Kepler architecture, direct thread data exchange through the SHUFFLE instruction.

  Back
 
Keywords:
Video & Image Processing, Medical Imaging & Visualization, GTC 2013 - ID S3193
Streaming:
Download:
 
Alexandros-Stavros Iliopoulos (Duke University)
Learn how real-time gigapixel snapshots may be achieved! AWARE-2 is a revolutionary camera prototype, composed of 98 low-end micro-cameras, which enables capturing gigapixel-scale snapshots of dynamic scenes. This shifts the challenge of dynamic ...Read More

Learn how real-time gigapixel snapshots may be achieved! AWARE-2 is a revolutionary camera prototype, composed of 98 low-end micro-cameras, which enables capturing gigapixel-scale snapshots of dynamic scenes. This shifts the challenge of dynamic gigapixel image formation to the computational side of the system. In our session, we present a new processing pipeline for the AWARE-2 camera, which employs a host of compensation and composition techniques, and maps them to GPUs, in order effect the timely output of high-quality gigapixel imagery. The resulting images will be compared to those produced by current state-of-the-art static gigapixel imaging systems.

  Back
 
Keywords:
Video & Image Processing, Computational Photography, Media & Entertainment, GTC 2013 - ID S3219
Streaming:
Download:
 
Andre R. Brodtkorb (SINTEF)
Learn how to accurately track objects in real-time using standard video cameras and commodity GPUs. This session will explain the steps required to exploit the vast processing capabilities of commodity level GPUs for hardware-accelerated video d ...Read More

Learn how to accurately track objects in real-time using standard video cameras and commodity GPUs. This session will explain the steps required to exploit the vast processing capabilities of commodity level GPUs for hardware-accelerated video decoding and real-time video processing. We use the NVCUVID video decoder API and CUDA to perform live video synchronization, foreground segmentation, optical flow calculation, volume carving, and tracking: all in real-time on a single GPU. Our system handles tracking on four high-resolution cameras at 25 frames per second, and supports up-to 16 cameras using multi-GPU processing.

  Back
 
Keywords:
Video & Image Processing, Computer Vision, Media & Entertainment, GTC 2013 - ID S3227
Streaming:
Download:
 
Chris Padwick (DigitalGlobe)
A high performance image processing system has been implemented which utilizes the GPU for the compute intensive processing steps. The system is designed to process very large satellite images and features a full image processing workflow compri ...Read More

A high performance image processing system has been implemented which utilizes the GPU for the compute intensive processing steps. The system is designed to process very large satellite images and features a full image processing workflow comprising the following elements: orthorectification, pan-sharpening, color balancing, tiling, metadata generation, and mosaicing. Performance measurements on individual algorithms implemented in for the system show a processing speedup ranging from 107X-213X over single threaded CPU versions of the same algorithm. The system has been benchmarked against an existing high performance image processing system running on similar hardware. In end to end testing, the GPU-based system shows a 14.4X speedup over the existing system, providing greater than an order of magnitude reduction in processing time.

  Back
 
Keywords:
Video & Image Processing, Cloud Visualization, GTC 2013 - ID S3264
Streaming:
Download:
 
Abhijit Patait (NVIDIA), Swagat Mohapatra (NVIDIA)
This session provides a broad overview of the video encoding capabilities of the NVIDIA GPUs. We will provide an overview of the hardware capabilities and software APIs used for video encoding. Representative performance and quality metrics of N ...Read More

This session provides a broad overview of the video encoding capabilities of the NVIDIA GPUs. We will provide an overview of the hardware capabilities and software APIs used for video encoding. Representative performance and quality metrics of NVIDIA video encoding will also be presented. Additionally, a quick overview of how NVIDIA video encoding can be used in various applications such as transcoding, low-latency applications, virtualization, streaming etc. will be provided.

  Back
 
Keywords:
Video & Image Processing, Media & Entertainment, GTC 2013 - ID S3379
Streaming:
Download:
 
Sean Varah (MotionDSP), Nemanja Grujic (MotionDSP Inc.)
Aerial surveillance applications using either full motion video (FMV) or Wide Area Motion Imagery (WAMI) sensors requires accurate detection and tracking algorithms. Real-world environmental conditions (atmospheric haze, smoke, fog), and long st ...Read More

Aerial surveillance applications using either full motion video (FMV) or Wide Area Motion Imagery (WAMI) sensors requires accurate detection and tracking algorithms. Real-world environmental conditions (atmospheric haze, smoke, fog), and long stand-off distances make detection and tracking difficult, resulting in poor detection rates and high rates of false positives. The MSER algorithm is well-suited to these conditions, but extremely slow to compute, and until recently, impractical to use. MotionDSP implemented a new, massively parallel algorithm for MSER, implemented in CUDA, which has increased performance by more than 20x, and enabled real-time performance on extremely high resolution imagery.

  Back
 
Keywords:
Video & Image Processing, Signal Processing, GTC 2013 - ID S3397
Streaming:
Download:
 
Michael Gehm (University of Arizona)
In this session, you will learn about the AWARE gigapixel-scale camera systems that were recently developed and how their physical architecture and massive data rates result in a number of unique image stitching and rendering challenges. The cor ...Read More

In this session, you will learn about the AWARE gigapixel-scale camera systems that were recently developed and how their physical architecture and massive data rates result in a number of unique image stitching and rendering challenges. The core of the presentation will reveal a suite of strategies for dealing with the hundreds of individual images that must be combined into a seamless, high-dynamic range composite and delivered to multiple users at near-video rates. Special emphasis will be provided to the details of how the specific strategies combine to recast the challenges into forms that are well-suited to a map/reduce-style approach on GPU hardware.

  Back
 
Keywords:
Video & Image Processing, Media & Entertainment, GTC 2013 - ID S3456
Streaming:
Download:
 
Dilip Patlolla (Oak Ridge National Laboratory), Anil Cheriyadat (Oak Ridge National Laboratory)
There is a great demand for vision algorithms running on high performance computing (HPC) architecture capable of processing peta-scale image data. We exploit the parallel processing capability of GPUs to present a GPU-friendly algorithm for an ...Read More

There is a great demand for vision algorithms running on high performance computing (HPC) architecture capable of processing peta-scale image data. We exploit the parallel processing capability of GPUs to present a GPU-friendly algorithm for an efficient detection of human settlements from high-resolution satellite imagery. The presentation gives an overview of the our GPU implementation capable of extracting human settlement regions in seconds from a city-scale sub-meter spatial resolution aerial imagery spanning areas in thousands of sq. kilometers.

  Back
 
Keywords:
Video & Image Processing, Supercomputing, GTC 2013 - ID S3495
Streaming:
Download:
 
Rupali Deshpande (NVIDIA)
Throughout the ages, the diamond''s dance of light has captured our fascination. This beauty does not happen by chance but is clearly determined by the design and craft of the diamond''s cut. This presentation will provide an ove ...Read More

Throughout the ages, the diamond''s dance of light has captured our fascination. This beauty does not happen by chance but is clearly determined by the design and craft of the diamond''s cut. This presentation will provide an overview of how GPUs are accelerating the process of converting rough stones to polished & sparkling diamonds. Technology brought a huge spike in the number of diamond pieces being processed from a meager 5 pieces per worker per day to around 150 pieces being cleared everyday by each worker. The industry challenge is to get the rough stones converted into diamonds with high precision and quality, minimum wastage and at speed of light. The range of computational visualization techniques where NVIDIA GPUs have helped address these challenges are GPU-based reconstruction, CUDA based ray tracing, surface rendering of voxel data, density contrast, and OpenGL Rendering. In this presentation attendees will learn how GPU technologies are determining the 4C''s of the diamond industry - Clarity, Color, Cut and Carat.

  Back
 
Keywords:
Video & Image Processing, Ray Tracing, GTC 2013 - ID S3541
Streaming:
Download:
 
 
NVIDIA - World Leader in Visual Computing Technologies
Copyright © 2014 NVIDIA Corporation Legal Info | Privacy Policy