NVIDIA invited women around the world to share how they use CUDA in meaningful ways. Here are some of their stories.

**Sonia Lopez Alarcon**

Rochester Inst. of Technology

Our research focuses on heterogeneous computing. Heterogeneous chip multiprocessors combining GPU and CPU cores are already a reality, and compute-intensive applications could greatly benefit from distributed execution across different hardware choices.

However, the design process of distributed hardware-software solutions is extremely complex. Understanding the needs of the applications and how their phases interact with the different hardware available is key to the advancement of fields such as medical diagnosis.

GPUs provide a different way to look at computer architecture. Programing in CUDA also provides a new way of thinking - "Thinking in parallel" - and is usually the first exposure that students have to parallel programming.

In my research, we have worked on all kinds of applications: b-splines acceleration, image stitching, heart imaging, hashing and encryption... In addition, we have used CUDA-enabled cards to explore the best hardware platforms for a number of linear algebra computations. We constantly keep GPUs in our minds!

**Rommie Amaro**

Univ. of Calif., San Diego

In my lab, we are broadly concerned with the development and application of state-of-the-art computational and theoretical techniques to investigate the structure, function, and dynamics of complex biological systems.

At the interface of chemistry, biology, physics, and pharmacology, our research integrates both applied and basic science components, with goals to bridge the interface between basic and clinical research.

Fundamental enzymological and drug discovery studies are tightly coupled to a wide range of biochemical and biophysical experiments that allow us to engage in dynamic and exciting collaborations with various experimental labs. We harness the power of computing to do exciting things in biomedical research, especially in the areas of computer aided drug design and multi-scale modeling.

The enormous speedups in GPU computing are now enabling us to perform complex drug discovery experiments much more rapidly than before. What used to take us weeks, now takes just days or hours.

**Michela Becchi**

University of Missouri

My research interests are in parallel computing, heterogeneous computing, algorithm acceleration, hardware-software co-design, design of programming models and runtime systems for parallel computing, and networking systems.

GPUs help my work in two different ways. First, they are a great platform to program, and they open up a lot of interesting research questions. Second, I found that students tend to be very interested in GPUs, and working with GPUs is a great way of attracting students to parallel programming and computer architecture.

I really enjoy working at the intersection between hardware and software, and using cutting-edge technology. In addition, I appreciate having the opportunity to connect with scientists and working on different projects, from algorithm acceleration, to compiler and run-time design, to virtualization.

**Sunita ****Chandrasekaran**

University of Houston

I work as a Postdoctoral Research Fellow with Barbara Chapman in the HPCTools group in the Dept. of Computer Science at the University of Houston.

My research spans exploring programming models for heterogeneous and multicore systems, helping to develop industry standards for accelerators, multicore systems, analyzing power/energy efficiency of high performance computing systems including accelerators while using large scale scientific applications and so on.

The fact that GPUs can speed up processing by ?n? times and lessen the time taken to finish a computation is thrilling to me. With the recent development of high-level directive-based programming models for GPUs, the capacity of these massive devices can be even better exploited.

What excites me about my work is that I am involved in creating such high-level directive-based programming models that can be of huge benefit to the HPC community.**Fernanda Foertter**

Oak Ridge Natl. Lab

I train users to use Titan, the world's biggest GPU-accelerated machine! Titan helps scientists generate results faster with higher fidelity, and more resolution. It makes previously impossible science, possible.

The first time I heard about CUDA was in 2008. I told a graduate adviser I wanted to use CUDA to offload computationally expensive tasks in our code. The adviser's response: 'This is a fad.' Today, my office is next to a machine with 18,688 devices that were supposed to be "just a fad". For once, I was ahead of fashion trends!

I'm excited about my work because I get to see the new and exciting things our users do with GPUs. And they always challenge us to stay on top of our skills.

My advice to young people interested in computing is: Go play! Try something new, something silly. Have fun, see where it takes you. Make it break, fix it, then make it break again. Exploration has taught me more than any classroom or book.**Buket Benek Gursoy**

ICHEC

I have worked at ICHEC as a Computational Scientist under the European HPC project PRACE (Partnership for Advanced Computing in Europe) since 2012. As a member of the Novel Technologies group at ICHEC, I am primarily involved in activities related to GPGPU computing, such as installing, porting, benchmarking, developing and optimizing accelerator-based scientific codes.

In particular, I am responsible for maintaining and improving the CUDA port of the GPU enabled Molecular Dynamics software DL_POLY. I also take part in educating and supporting ICHEC users and researchers across Ireland about GPU programming and tools.

When I started in ICHEC, ICHEC had been announced as a CUDA Research Center and won the HPCwire Readers' Choice Award. One of the projects in this context was the development of a matrix-matrix multiplication library (PhiGEMM), which complemented my area of interest in heterogeneous systems, where a significant performance increase was achieved on multiple GPUs. This has been my main motivation to work with GPU computing.

**Valerie Halyo**

Princeton University

I work at the Large Hadron Collider (LHC) at CERN in Geneva, Switzerland. The main goal is to extend the physics accessible at the LHC to include new topological configurations or simply new physics that previously evaded detection or suffered from extremely low selection efficiency.

Scientific computing is a critical component of the LHC experiment, including operation, trigger, LHC computing GRID, simulation, and analysis. One way to improve the physics reach of the LHC is to take advantage of the flexibility of the trigger system by integrating coprocessors based on GPUs architecture into its server farm.

This cutting-edge technology provides not only the means to accelerate existing algorithms, but also the opportunity to develop new algorithms that select events in the trigger that previously would have evaded detection.**Preetha Joy**

NeST Software

My work in GPU computing has been in industrial domain-spanning areas such as non-destructive testing, security and surveillance, molecular analysis and fluid dynamics. The primary objective of most of the projects has been optimization to achieve real-time performance.

We focus on profiling and analyzing the current implementation to identify the most time consuming and - at the same time - data parallel areas and to come up with the best possible GPU-based solution. To do this, we often need to think out of the box.

Performance optimization of software is a fascinating area of research. There may be many possible options and the one that is best suited for one problem may or may not be the best solution for another. Analyzing and finding the best solution for the project at hand is both challenging and exciting.

**Charu Kalra**

Northeastern Univ.

I work primarily on graphic compilers, in the NUCAR (Northeastern University Computer Architecture Research) group. I am working on improving the performance of various CUDA applications by using different techniques.

I really like CUDA as it's easy and programmer-friendly. Being a computer engineering student, there is always a curiosity to know the underlying details which are abstracted away from the programmer.

I am excited about how the challenges which I encounter during my work push me to think in a more innovative manner. I learn something new every day and that process keeps me going. The power of CUDA is not what you see on the surface, but what's underneath it.**Elaine Kant**

SciComp Inc.

I am the lead developer of an 'automatic programming' software system called SciFinance. SciFinance automatically translates model specifications into C-family code (including CUDA and OpenMP code) for mathematical modeling of financial products such as derivatives.

Our customers are investment banks and hedge funds that manage large portfolios of pricing and risk analysis codes. In order to run enough scenarios to properly analyze their deals and positions, it is important for the banks to be able to run their codes quickly. Many of the modeling codes are based on Monte Carlo methods, which lend themselves to CUDA-based speed ups.

The GPU codes that SciFinance produces for Monte Carlo simulations typically run 30-50 times faster (per double precision GPU) than sequential codes.**Sara Maria Rubio Largo**

Univ. of Navarra

I am a PhD student in the field of physics of granular media (examples of a granular system include sand, beans, powders, and planetary rings).

Determining the mechanical properties of granular media is a challenging task. Recently, granular media has been widely examined experimentally, analytically and numerically and is showing unexpected behaviors.

The aim of my research is to study numerically granular systems, trying to understand and describe their physical behavior and their macroscopic mechanical properties. In order to do so, we are about to implement a realistic molecular dynamics algorithm to simulate different geometries and stable granular systems.

CUDA is a very powerful tool to optimize this code. We can take advantage of the multi-core architecture of our GPUs by manipulating the large blocks of data per particle at the same time.**Miriam Leeser**

Northeastern University

My research involves using non-traditional architectures to accelerate scientific applications.These architectures include FPGAs and GPUs mixed with CPUs and multicore processors.

I have done work in accelerating medical image processing on GPUs. I am currently workingon protein modeling on GPUs. We have brought Matlab code that runs in over an hour down to milliseconds using CUDA. The resulting code can now easily be used to transform large datastructures.

The goal is to allow biochemists to choose the representation they wish to use to model proteins without worrying about the underlying representation of molecules. Possible representations are Cartesian (x,y,z) coordinates or angles and bond lengths.

GPUs are especially good for problems with large datasets. Proteins can have hundreds of thousands of molecules. The scientists we are working with are very excited about the protein modeling results we are getting.

**Janice McCarthy**** **

Duke University

I develop new statistical methods and algorithms for studying the association between genetics and disease risk.

Most of what I do is 'embarrassingly parallel' and requires tens of thousands to hundreds of thousands of iterations of complex algorithms on large data sets. High-throughput allows me to consider methods that would be prohibitive when done serially.

One area I am interested in is called 'rare variant analysis'. It requires large sample sizes and may require permutation methods. It would be computationally impossible to perform such analysis in serial or on a handful of nodes.

I'm excited about my work because it combines mathematics, statistics, computer science and biology and has the potential to help people.

**Anna Nelasa**

Zaporozhye Natl. Tech. Univ.

The subject of my research is development of a high-performance library for doing Number Theory, in particular long arithmetic of Galois fields (prime field and extended field of characteristic two) and arithmetic elliptic and hyperelliptic curves over these finite fields for public key cryptography.

One of the hard problems with space and time complexity is the task of counting point elliptic and hyperelliptic curves over finite fields. There is an opposite formulation of this problem: to find the curve?s coefficients for the predetermined convenient order.

Applying GPUs in combination with CUDA allows a speed up of these calculations. I feel that parallel computing was invented to solve problems that did not exist when there was no parallel computing!

**Maria Pantoja**

Santa Clara University

I teach parallel programming to undergraduate and graduate students. My personal research is to develop a learning assistance instructional e-tool capable of assessing a student's pronunciation and improving it through automatic feedback.

The model integrates speech and image recognition technology capable of analyzing the learner's input. The image/audio analysis and the expert system will be implemented using parallel programming with GPUs to allow fast feedback to users.

GPUs help me to get real-time results on calculations that used to take months. To paraphrase one of my students: 'Parallel programming allows you to unlock the full potential of your mobile device or computer. Performance improvement of this magnitude (100x+) is about so much more than just 'speeding up' existing applications; it's about uncovering entirely new applications never before possible.'

By the way, thanks to one of my students (now also faculty) who broke the record for finding the most digits of Pi calculated using CUDA, I was featured on a TV newscast.

**Valentina Popescu**

LAAS-CNRS Laboratory

I am studying for my Master's at Paul Sabatier University in Toulouse, France. My program focuses on distributed systems and critical software.

I am about to complete a five-month internship entitled "Towards fast and certified multiple-precision libraries", in the Methods and Algorithms in Control (MAC) team at the LAAS-CNRS Laboratory in Toulouse. The focus is on developing an efficient multiple-precision arithmetic library using the CUDA programming language for the NVIDIA GPU platform.

This project is of great importance in the area of high performance scientific computing. On the one hand, GPUs represent an important development hardware platform for many applications that demand massive parallel computations; on the other hand, currently GPU-tuned multiple-precision arithmetic libraries are scarce.**Dhivya Sabapathy**

IIT Madras

I am a second year master?s student at IIT Madras, India. My research work focuses on medical imaging, processing and reconstruction. At present, I am working on simulation of MR images using CUDA for NVIDIA?s GPU.

Speed and memory are two great enemies of any research group working on medical imaging. CUDA has become the friend in need to rescue us. Working with large medical images has always been dream which is now possible. Open-source access and user-friendly interface are the most attractive features of CUDA programming language.

The simulation generates MR images from phantoms using intense mathematical computations. GPUs are very handy in performing highly accelerated calculations. We have witnessed significant improvement in speed by implementing high level optimization using CUDA. This increases the speed of research and helps us to explore further.

CUDA amazes me by its speed and performance. I am fascinated by the way it produces results in a flash. I learn something new every day in CUDA and this excites me a lot!

**Kate Stafford**

Univ. of Calf., San Francisco

I am a postdoctoral scientist at the University of California, San Francisco. I work on modeling, molecular docking, and systems pharmacology applied to orphan G-protein-coupled receptors (GPCRs).

GPCRs are proteins expressed on the surface of cells that respond to specific extracellular signaling molecules. They are the targets of 30-50% of drugs currently in clinical use, but many members of the family are "orphan" receptors whose corresponding signaling molecules are not known. The goal of my work is to identify synthetic tool compounds that bind to orphan GPCRs, ultimately to be used in identifying the actual endogenous signaling

My PhD project involved extensive molecular dynamics (MD) simulations used to interpret data on macromolecular motions collected by nuclear magnetic resonance (NMR) spectroscopy. MD codes optimized for GPUs provide significant performance advantages relative to traditional computer clusters. Desire for improved MD performance and longer simulations led to my interest in GPU computing.**Monica Syal**

Advanced Rotorcraft Tech.

I work as an Aerospace Engineer at Advanced Rotorcraft Technology (ART), where we develop advanced flight dynamic simulation facilities for rotorcraft as well as fixed-wing aircraft. Two of the most important aspects of conducting such simulations are: (a) maintaining accuracy of the simulated physical phenomena, and (b) being able to do this in real-time. GPUs play a very significant role in helping us achieve both of these goals.

For example, I am currently developing a methodology to simulate the multi-disciplinary rotorcraft “brownout” phenomenon in a flight simulator. Brownout involves the formation of a large and dense dust cloud around a rotorcraft when it is landing or taking off from a desert-like environment. As a consequence, pilots may lose visibility of the landing zone, and may also experience spurious sensory cues that can lead to serious accidents. The objective of our research is to develop a brownout simulation methodology that can help understand and mitigate this problem.

Real-time simulation of brownout is extremely challenging because it involves modeling of several complex physical phenomena: (a) rotor-generated flow field, (b) dust particle dynamics, (c) visual obscuration caused by the dust clouds and (d) dust cloud rendering in real-time in a flight simulator. CUDA has become an indispensable tool in this research by allowing massive parallelization of the simulation on thousands of GPU cores. In fact, all the computations as well as data communication between different methodologies are completely implemented on GPUs so that low bandwidth data communication between CPU and GPU can be totally avoided.

**Michela Taufer**

University of Delaware

My GPU work is all about rethinking application algorithms to fit on the GPU architecture and get the most out of its computing power, while preserving the scientific accuracy of the simulations using the rethought and redesigned applications.

This has resulted in many exciting achievements for me and my group at the University of Delaware. For example, my group and I were the first to propose a completely-on-GPU PME code for molecular dynamics (MD) simulations. We achieved that goal by changing the traditional way researchers are looking at the charges in long-range electrostatics and their interactions.

In the summer of 2009 we focused on empirical observations of the reproducibility and stability challenges on GPUs that resulted in rethinking the way we perform arithmetic operations on GPUs. We ended up proposing a new way to perform arithmetical operations called composite precision that can potentially mitigate result driftings in MD and other n-body simulations.

**Mae Woods**University College

I am a postdoctoral researcher working in the lab of Dr. Chris Barnes at University College London in the UK. I am an applied mathematician and my research interest is mutational processes in the genome.

To develop models I work with dynamical systems, approximate Bayesian computation and utilize high performance computing resources including GPUs. I am currently using a software package (cuda-sim) to simulate my stochastic models using CUDA. During my PhD I developed my own software that is based on the discrete element method to model neural crest collective cell migration.

I was first introduced to CUDA and GPU Computing by Prof. Iain Couzin during a visit to his lab at Princeton University. At the time I was interested in collective migration of the neural crest cells. I had developed a model and Iain suggested that we could exploit the parallel nature of my model and develop software using CUDA. I was introduced to the CUDA functions and had access to code developed by Iain and his lab to model insect swarms, fish schools and human crowds. This helped me to see how to reprogram my code in a similar way.

**Tatyana Makhalova**Perm State National Research University

My research focuses on the identification of possible locations of archaeological excavations by modeling migration processes based on cellular automata.

Computational modeling complexity associated with large sizes of the investigated area and numerous repetitions of computational experiments leads to long execution time.

This complicates the process of obtaining the results of calculations and investigation of the properties of the model. Thus, there was a need to reduce the simulation time.

Using GPUs reduces execution of the program by 11x. Significant reduction in simulation time and the availability of GPU technology opens the possibility of using the GPU for further research.

**Lena Oden**

Fraunhofer ITWM

The focus of my work is communication and data transfer between GPUs on distributed nodes of a cluster. I?m looking into different communication and data transfer methods to help other people optimize multi-GPU applications.

Looking at today's as well as future systems, communication is one of the main bottlenecks of many applications. Studies show that communication - not computation - will dominate power consumption in the coming years.

Therefore, I think my research is very important for future energy efficient systems. The main part of this work is the integration of GPUs in the PGAS API GPI, which was developed at the Fraunhofer Institute for Industrial Mathematics.**Fanny Nina Paravecino**

Northeastern Univ.

My research focuses on re-structuring sequential algorithms into parallel algorithms leveraging GPU architectures. In this way, I?m looking for high performance improvement in terms of time and accuracy.

Currently, my research domain is image processing, but most of the algorithms that I?m working on could apply to other domains (i.e. machine learning, computer vision, big data, and so on). In particular, I'm working on Connected Component Labeling, Spectral Clustering, Level Set and Graph-based segmentation using the latest features on Kepler GK110.

I think of my work as a small piece of human intelligence on a machine. With the advances in technology, we can collect thousands of input data, but processing them with accuracy can take a while. It excites me that we can achieve much faster performance while still insuring accuracy with GPUs, with more and more features to explore.

**Kenia Picos**

CITEDI-IPN

I'm working on research based on lighting invariance for 3D object recognition. After I wrote my first program in CUDA for image processing, I knew that I wanted to implement it for my Master's thesis.

Now I?m using CUDA in my PhD studies. I'm fascinated by computer graphics and image processing. It's something I've always wanted to do.

I am working with multi-GPU platforms, and I have so much available computing capability that the only limit is imagination! This is a very important era for computer graphics, as the computing capability that's available to people now was not possible 10 years ago.**Bkhandari E. Ramovna**

D. Mendeleyev Univ. of Chemical Technology

My work describes the model of a heterogeneous chemical reaction in a membrane nanopore based on the example of carbon dioxide reforming of methane to synthesis gas.

The simulation is based on the methods of molecular dynamics. Particles move according to the laws of classical mechanics. The simulation of the heterogeneous reaction is based on the collision.

The simulation performs many of the same operations (checking for a collision with a wall, checking for intermolecular collisions chemical conversion, etc.). Checking for intermolecular collisions and chemical reaction was the most resource-intensive task. And with CUDA it was faster as compared to single-threaded version.

GPUs help to accelerate calculation, and as a consequence of this we were able to have a longer calculation (about 10 million iterations) to obtain more accurate data.

**Sara Falamaki**

CSIRO

I'm a software engineer at CSIRO (Commonwealth Scientific and Industrial Research Organisation) in Australia. My CUDA project involved detecting water in bitmap images of Australia, and vectorising the water filled areas. The goal was to track flooding episodes on the continent.

By running a simple algorithm in parallel on GPUs, we cut processing time from many hours to about three seconds. This meant that we could process images multiple times a day.

My advice to young people interested in computing: Practice, practice, practice! Learn to program as soon as you can (right now!), and keep on doing it. There are so many great ways to learn to program, and anyone can do it. It's much more fun being a creator of software than a consumer of it.

**Carla Osthoff**

Laboratório Nacional de Computação Científica

I work at the Laboratório Nacional de Computação Científicain, a research institute of the Ministry of Science, Technology and Innovation (MCTI) in Brazil. I do research related to high-performance computing, parallel applications and parallel programming tools.

GPUs reduce processing time on various research applications in our laboratory, including molecular dynamics, bioinformatics, quantum computing, and oil and gas.

**Meng Qi**

National University of Singapore

I am a PhD student at the School of Computing, National University of Singapore. I am focused on using the GPU to solve the 2D mesh generation problem. My publications include 'Computing 2D Constrained Delaunay Triangulation Using Graphics Hardware' (2012 ACM Symposium on Interactive 3D Graphics and Games).

According to our experiments for both synthetic and real-world data, our GPU algorithms are numerically robust and run up to two orders of magnitude faster than the fastest sequential algorithm. For example, given ten million points and one million constraints, triangle software spends 62 seconds constructing the constrained Delaunay triangulation, in which 46 seconds are spent on constraints insertion.

In contrast, our algorithm spends 3.2 seconds constructing the constrained Delaunay triangulation, in which only 0.47 seconds are spent on constraints insertion. It is very exciting to see that a time-consuming computational geometry problem is solved within one second on the GPU because of our algorithms.

Advice to young people interested in computing: Trying to solve a practical problem is far more important than just reading others' publications. Sometimes knowing little is better than knowing too much. Too much information may limit your inspiration and imagination. Try to distinguish between preparation and over preparation.

**Lorena Barba**

George Washington University

Lorena Barba is an associate professor in the School of Engineering and Applied Science at George Washington University. Professor Barba received her MS and PhD degrees in aeronautics from the California Institute of Technology and her BS and PE degrees in mechanical engineering from Chile's Universidad Técnica Federico Santa María.

Professor Barba is an Amelia Earhart Fellow of the Zonta Foundation (1999), a recipient of the EPSRC First Grant program (UK, 2007), an NVIDIA Academic Partner award recipient (2011), and a recipient of the NSF Faculty Early CAREER award (2012). She was appointed CUDA Fellow by NVIDIA in 2012 and is an internationally recognized leader in computational science and engineering.

**Neelima B. Reddy**

NMAMIT

GPU Optimizations

I teach CUDA theory and labs at our college. Further, the students have chosen CUDA to carry out their major project and taken high performance computing using GPUs as their area of research interest in their higher education. GPU computing is my primary research area apart from multi-core architectures, parallel compilers and high performance computing. At work, I am excited to use the latest technology and happy to be part of a fast-growing research community.

**Advice for young person interested in computing:** Start with simple programs to understand the correct picture of where, when and which computation takes place. CUDA is a good place to start with GPU computing because the learning curve is small.