﻿<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="applications.xsl"?>
  <Applications>

        <Application>
        <GUID>8cdd1372-7efb-449b-82d0-ba018el469a9</GUID>
        <Name>PyCOOL</Name>
        <ShortDescription>PyCOOL (Cosmological Object-Oriented Lattice code) is a fast GPU accelerated program that solves the evolution of interacting scalar fields in an expanding universe with symplectic algorithms. The program has been written with the intention to hit a sweet spot of speed, accuracy and user friendliness. This is achieved by using the Python language with the PyCUDA interface to make a program that is very easy to adapt to different scalar field models.</ShortDescription>
        <URL>http://www.physics.utu.fi/tiedostot/theory/particlecosmology/pycool/</URL>
     	<BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/pycool-low.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/pycool-med.png</BoxArtImageURLMed>
         <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Turku / Department of Physics and Astronomy</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2012</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>01/24/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
			<Author email="jtksai@utu.fi">Jani Sainio</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.physics.utu.fi/tiedostot/theory/particlecosmology/pycool/">Application</ContentType>
		   <ContentType url="http://www.physics.utu.fi/tiedostot/theory/particlecosmology/pycool/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword></Keyword>
        </Keywords>
		</Application> 
  
        <Application>
        <GUID>8cdd1372-7efb-449b-82d0-ba018ef469a9</GUID>
        <Name>CUDAfy.NET</Name>
        <ShortDescription>An open source Microsoft .NET library that allows writing of CUDA applications including device code in languages such as C# and VB. Contains wrappers for CUSPARSE, CUBLAS, CUFFT and CURAND, as well as a growing number of specialized numerics libraries.</ShortDescription>
        <URL>http://www.hybriddsp.com/Products/CUDAfyNET.aspx</URL>
     	<BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/cudafyi-low.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/cudafy-med.png</BoxArtImageURLMed>
         <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Hybrid DSP Systems</OrganizationName>
        <OrganizationURL>http://www.hybriddsp.com</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>12/07/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
          
	   <Author email="info@hybriddsp.com">Hybrid DSP Systems</Author>
       </Authors>
        <ContentTypes>
           <ContentType url="http://www.hybriddsp.com/Products/CUDAfyNET.aspx">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
		   <ApplicationType>Libraries</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>.NET,C#,VB,Solver</Keyword>
        </Keywords>
		</Application>   
  
        <Application>
        <GUID>8cdd1372-7efb-449b-82d0-ba018ef469a6</GUID>
        <Name>QnDynCUDA</Name>
        <ShortDescription>We present a set of C++ classes which allow one to use the graphics card processors cores for quantum ab initio simulations, i.e. a direct solving of the time-dependent Schrödinger equation, gaining the benefits from the parallel architecture of the graphical processor units. We use the Chebyshev polynomial and FFT algorithm. The solution is based on NVIDIA CUDA technology. The speed-up factor in the test runs of our classes performed using the graphics card processor can even be of order of 300 in comparison with the test runs using only the single core of CPU. Not only the Schrödinger equation can be integrated using the presented solver. With only small changes it can be used for solving the nonlinear Gross–Pitaevskii equation of BECs dynamics, the heat equation, the diffusion equation or other parabolic partial differential equations of second order.</ShortDescription>
        <URL>http://dx.doi.org/10.1016/j.cpc.2011.11.026</URL>
     	<BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/QnDynCUDA-low.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/QnDynCUDA-med.png</BoxArtImageURLMed>
         <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nicolaus Copernicus University</OrganizationName>
        <OrganizationURL>http://fizyka.umk.pl</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>09</ReleaseDay>
        <ReleaseDateDisplay>12/09/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="">Tomasz Dziubak</Author>
	   <Author email="jacek@phys.uni.torun.pl">Jacek Matulewski</Author>
       </Authors>
        <ContentTypes>
           <ContentType url="http://dx.doi.org/10.1016/j.cpc.2011.11.026">Code</ContentType>
           <ContentType url="http://dx.doi.org/10.1016/j.cpc.2011.11.026">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
		   <ApplicationType>Libraries</ApplicationType>
		   <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>CUDA</Keyword>
        </Keywords>
		</Application> 

        <Application>
        <GUID>8cdd1372-7efb-449b-82d0-ba08wef469a6</GUID>
        <Name>Efficient Decoding of QC-LDPC Codes Using GPUs</Name>
        <ShortDescription>In this work, we propose an efficient quasi-cyclic LDPC (QC-LDPC) decoder simulator which runs on graphics processing units (GPUs).We optimize the data structures of the messages used in the decoding process such that both the read and write processes can be performed in a highly parallel manner by the GPUs. We also propose a highly efficient algorithm to convert the data structure of the messages from one form to another with very little latency. Finally, with the use of a large number of cores in the GPU to perform the simple computations simultaneously, our GPU-based LDPC decoder is found to run at around 100 times faster than a CPU-based simulator.</ShortDescription>
        <URL>http://www.springerlink.com/content/j8g53w2260224wx7/</URL>
 	<BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/QC-LDPC-low.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/QC-LDPC-med.png</BoxArtImageURLMed>
          <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>The Hong Kong PolyU</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>06/16/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>100</SpeedUp>
        <SoftwareLicenseType>N/A</SoftwareLicenseType>
        <Authors>
           <Author email="zhaoyue.edu@gmail.com">Yue Zhao</Author>
		   <Author email="">Xu Chen</Author>
		   <Author email="">Chiu-Wing Sham</Author>
		   <Author email="">Wai M. Tam and Francis C. M. Lau</Author>
       </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/j8g53w2260224wx7/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>CUDA</Keyword>
        </Keywords>
		</Application> 
		
        <Application>
        <GUID>8cdd1372-7efb-449b-82d0-ba018ef4542f</GUID>
        <Name>Integrating CUDA &amp; GNU Autotools</Name>
        <ShortDescription>One of the drawbacks to GNU Autotools is that is only provides native support for certain languages, however, it is flexible enough so that you can make it do what you want it to do... if you know how. I wanted to distribute CUDA based applications with GNU Autotools but unfortunately CUDA is not one of the languages that it supports... so I started googling around. I found several threads where other people wanted to be able to do the same thing. I found various bread crumbs here and there that enabled me to piece together a working build. Since I couldn't find all of the information that I needed in one spot, I figured I'd write it all down and publish it so others don't have to waste time figuring it out. "The ClusterChimps Guide to Integrating CUDA and GNU Autotools" is a simple guide to building CUDA targets using GNU Autotools. It will show you how to build stand alone CUDA applications, static CUDA libraries, and shared CUDA libraries. The guide comes with a companion example tarball. </ShortDescription>
        <URL>http://www.clusterchimps.org/autotools.php</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/dr-zaius-low.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/dr-zaius-med.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>ClusterChimps.org</OrganizationName>
        <OrganizationURL>http://www.clusterchimps.org</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>18</ReleaseDay>
        <ReleaseDateDisplay>11/18/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="zaius@clusterchimps.org">Dr. Zaius</Author>
       </Authors>
        <ContentTypes>
           <ContentType url="http://www.clusterchimps.org/autotools.php">Code</ContentType>
           <ContentType url="http://www.clusterchimps.org/autotools.php">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Tools</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>CUDA, Autotools</Keyword>
        </Keywords>
		</Application> 	 
  
        <Application>
        <GUID>8cdd1372-7efb-449b-82d0-ba018ef543e8</GUID>
        <Name>FMRI Analysis on the GPU</Name>
        <ShortDescription>Faster fMRI analysis by using the GPU.</ShortDescription>
        <URL>http://www.sciencedirect.com/science/article/pii/S0169260711001957</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/article-01.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/article-02.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Linköping University</OrganizationName>
        <OrganizationURL>http://www.liu.se</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>11/12/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>N/A</SoftwareLicenseType>
        <Authors>
           <Author email="andek@imt.liu.se">Anders Eklund</Author>
		   <Author email="matsa@imt.liu.se">Mats Andersson</Author>
		   <Author email="knutte@imt.liu.se">Hans Knutsson</Author>
		   
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.sciencedirect.com/science/article/pii/S0169260711001957">Multimedia</ContentType>
           <ContentType url="http://www.sciencedirect.com/science/article/pii/S0169260711001957">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Signal Processing</ApplicationType>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>fMRI, GPU, permutation test</Keyword>
        </Keywords>
     </Application> 

	  <Application>
        <GUID>8cdd1372-7efb-449b-82d0-ba018ef567r9</GUID>
        <Name>True 4D Image Denoising on the GPU</Name>
        <ShortDescription>4D Image denoising on the GPU</ShortDescription>
        <URL>http://www.hindawi.com/journals/ijbi/2011/952819/</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/4-dimension-01.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/4-dimensions-02.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Linköping University</OrganizationName>
        <OrganizationURL>http://www.liu.se</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>11/12/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>N/A</SoftwareLicenseType>
        <Authors>
           <Author email="andek@imt.liu.se">Anders Eklund</Author>
		   <Author email="matsa@imt.liu.se">Mats Andersson</Author>
		   <Author email="knutte@imt.liu.se">Hans Knutsson</Author>
		   
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.hindawi.com/journals/ijbi/2011/952819/">Multimedia</ContentType>
           <ContentType url="http://www.hindawi.com/journals/ijbi/2011/952819/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Signal Processing</ApplicationType>
           <ApplicationType>Medical Imaging</ApplicationType>
		    <ApplicationType>Science</ApplicationType>
	        </ApplicationTypes>
        <Keywords>
           <Keyword>4D, image denoising, CT</Keyword>
        </Keywords>
     </Application> 
  
  
        <Application>
        <GUID>8cdd1372-7efb-449b-82d0-ba018ef458y9</GUID>
        <Name>A real-time crosstalk canceller on a notebook GPU</Name>
        <ShortDescription>People want to participate in the communication with the feeling of being together and sharing the same environment. Crosstalk cancellation is one of the main applications in multichannel acoustic signal processing that provides this kind of feelings. This work shows that GPU can be used as a co-processor which carries out audio processing tasks, freeing CPU resources which can be employed in other tasks. </ShortDescription>
        <URL>http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;arnumber=6012072</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/159460_Crosstalk_01.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/159460_Crosstalk_02.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>INCO2 (http://www.inco2.upv.es/) and GTAC (http://www.gtac.upv.es/) Groups. Universidad Politecnica de Valencia</OrganizationName>
        <OrganizationURL>http://www.upv.es</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>06</ReleaseDay>
        <ReleaseDateDisplay>09/16/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>N/A</SoftwareLicenseType>
        <Authors>
           <Author email="jobelrod@iteam.upv.es">Jose A. Belloch</Author>
		   <Author email="agonzal@dcom.upv.es">Alberto Gonzalez</Author>
		   <Author email="fjmartin@dcom.upv.es">F. J. Martínez-Zaldívar</Author>
		   <Author email="avidal@dsic.upv.es">Antonio M. Vidal</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;arnumber=6012072">Application</ContentType>
           <ContentType url="http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;arnumber=6012072">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Signal Processing</ApplicationType>
           <ApplicationType>Video &amp; Audio</ApplicationType>
	        </ApplicationTypes>
        <Keywords>
           <Keyword></Keyword>
        </Keywords>
     </Application>
  
      <Application>
        <GUID>8cdd1372-7efb-449b-82d0-ba018ez454f2</GUID>
        <Name>Real-time massive convolution for audio applications on GPU</Name>
        <ShortDescription>Massive convolution is the basic operation in multichannel acoustic signal processing. This field has experienced a major development in recent years due to the growing need to incorporate new effects and the natural desire to improve the hearing experience. These acoustic effects require to compute multiples convolutions simultaneously in real-time. The work we present describes a GPU-implementation of all the operations involved in the convolution, extrapolated to multiple channels. One of the main feature in our work is the utilization of the streams on GPU. This allows to overlap computation and data-transfer from CPU to GPU. This application is the first step to carry out real-time multichannel-sound applications on GPU.</ShortDescription>
        <URL>http://www.springerlink.com/content/h37u46j2416m6733/</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/156214_Rea_Time_2.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/156214_Rea_Time_1.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>INCO2 (http://www.inco2.upv.es/) and GTAC (http://www.gtac.upv.es/) Groups. Universidad Politecnica de Valencia</OrganizationName>
        <OrganizationURL>http://www.upv.es</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>N/A</SoftwareLicenseType>
        <Authors>
           <Author email="jobelrod@iteam.upv.es">Jose A. Belloch</Author>
		   <Author email="agonzal@dcom.upv.es">Alberto Gonzalez</Author>
		   <Author email="fjmartin@dcom.upv.es">F. J. Martínez-Zaldívar</Author>
		   <Author email="avidal@dsic.upv.es">Antonio M. Vidal</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/h37u46j2416m6733/">Application</ContentType>
           <ContentType url="http://www.springerlink.com/content/h37u46j2416m6733/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Signal Processing</ApplicationType>
           <ApplicationType>Video &amp; Audio</ApplicationType>
	        </ApplicationTypes>
        <Keywords>
           <Keyword></Keyword>
        </Keywords>
     </Application>
  
  
    <Application>
        <GUID>8cdd1372-7efb-449b-82d0-ca018ef454f2</GUID>
        <Name>Exposure Render</Name>
        <ShortDescription>Exposure Render is a Direct Volume Rendering Application that applies progressive Monte Carlo raytracing, coupled with physically based light transport to heterogeneous volumetric data. Exposure Render enables the configuration of any number of arbitrarily shaped area lights, models a real-world camera, including its lens and aperture, and incorporates complex materials, whilst still maintaining interactive display updates. It features both surface and volumetric scattering, and applies noise reduction to remove the unwanted startup noise associated with progressive Monte Carlo rendering. The complete implementation is available in source and binary forms under a permissive free software license.</ShortDescription>
        <URL>http://code.google.com/p/exposure-render/downloads/list</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/655602-example-01.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/655602-example-02.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>TU Delft</OrganizationName>
        <OrganizationURL>http://graphics.tudelft.nl</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>10/19/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="t.kroes@tudelft.nl">T. Kroes</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/exposure-render/downloads/list">Application</ContentType>
           <ContentType url="http://code.google.com/p/exposure-render/downloads/list">Paper</ContentType>
		   <ContentType url="http://code.google.com/p/exposure-render/downloads/list">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Digital Content Creation</ApplicationType>
           <ApplicationType>Graphics</ApplicationType>
		    <ApplicationType>Imaging</ApplicationType>
			 <ApplicationType>Medical Imaging</ApplicationType>
			 <ApplicationType>Ray Tracing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Monte Carlo Simulation, NVIDIA CUDA, Open Source, Ray Tracing, Volume Rendering </Keyword>
        </Keywords>
     </Application>
	 
	 
	 
     <Application>
        <GUID>8cdd1372-7efb-449b-82e0-ba018ef454f2</GUID>
        <Name>DualSPHysics</Name>
        <ShortDescription>DualSPHysics is a combined CPU/GPU solver for mesh-free Smoothed Particle Hydrodynamics to be applied in CFD applications with free-surface flows. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1370_283365_dualsphysics_cuda_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1370_283365_dualsphysics_cuda_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>EPHYSLAB--Universidade de Vigo and School of Mechanical, Aerospace and Civil Engineering-University of Manchester</OrganizationName>
        <OrganizationURL>http://ephyslab.uvigo.es/index.php/eng/dual_sphysics/</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>01/11/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>90</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="alexbexe@uvigo.es">A.J.C. Crespo</Author>
           <Author email="jmdominguez@uvigo.es">J.M. Dominguez</Author>
           <Author email="mggesteira@uvigo.es">M.G. Gesteira</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.vimeo.com/dualsphysics/videos">Multimedia</ContentType>
           <ContentType url="http://dual.sphysics.org/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>SPH, GPU, meshless method, lagrangian, fluid dynamics, free-surface flow,A.J.C. Crespo,J.M. Dominguez,M.G. Gesteira,alexbexe@uvigo.es,jmdominguez@uvigo.es,mggesteira@uvigo.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9561fe2e-1de6-461b-b0dc-1cae41da5eb5</GUID>
        <Name>NeMo: real-time spiking neural network simulation</Name>
        <ShortDescription>Spiking neural network simulations are used to model biological brain structures. Simulating large-scale networks is computationally expensive, however, due to the number and interconnectedness of neurons in the brain. Furthermore, where such simulations are used in a embodied (i.e. robotic) setting, the simulation must be real-time in order to be useful. </ShortDescription>
        <URL>http://nemosim.sourceforge.net</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1368_193704_firing_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1368_193704_firing_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Imperial College London</OrganizationName>
        <OrganizationURL>http://www.imperial.ac.uk</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>18</ReleaseDay>
        <ReleaseDateDisplay>07/18/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>20</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="andreas.fidjeland@imperial.ac.uk">Andreas Fidjeland</Author>
           <Author email="">Murray Shanahan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://nemosim.sourceforge.net">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Life Sciences</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>neural network simulation,Andreas Fidjeland,Murray Shanahan,andreas.fidjeland@imperial.ac.uk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a6fbee45-c093-4bab-958c-ce8e5e7baa64</GUID>
        <Name>CUDA Benoit</Name>
        <ShortDescription>Realtime, high resolution, high iteration, supersampled, fractal zoom. Specify the vanishing point, iteration count and colors for an animated zoom into the Mandelbrot set and then watch the zoom without having to run a separate, lengthy, rendering step. Multiple zoom specifications, called tracks, can be stored and played back in sequence, creating a continuously running fractal zooming show.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1367_86365_player_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1367_86365_player_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName></OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>04/30/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="dahlsys@gmail.com">Roger Dahl</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.dahlsys.com/software/benoit/">Paper</ContentType>
           <ContentType url="http://www.dahlsys.com/software/benoit/">Application</ContentType>
           <ContentType url="http://www.dahlsys.com/software/benoit/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mandelbrot fractal realtime zoom log scale map,Roger Dahl,dahlsys@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>457a8222-cc5c-4ab5-a938-290aa493803f</GUID>
        <Name>SARUMAN</Name>
        <ShortDescription>SARUMAN (Semiglobal Alignment of short Reads Using CUDA and NeedleMAN-Wunsch) is a mapping approach that returns all possible alignment positions of a read in a reference genome sequence under a given error threshold, together with one optimal alignment for each of these positions. Alignments are computed in parallel on graphics hardware, facilitating an considerable speedup of this normally time consuming step.</ShortDescription>
        <URL>http://www.cebitec.uni-bielefeld.de/brf/saruman/saruman.html</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1366_61128_saruman_flow_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1366_61128_saruman_flow_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Center for Biotechnology, Bielefeld University</OrganizationName>
        <OrganizationURL>http://www.cebitec.uni-bielefeld.de/</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>03/30/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="saruman@cebitec.uni-bielefeld.de">Jochen Blom</Author>
           <Author email="">Tobias Jakobi</Author>
           <Author email="">Daniel Doppmeier</Author>
           <Author email="">Sebastian Jaenicke</Author>
           <Author email="">J&#xf6;rn KalinowskiJens Stoye, Alexander Goesmann</Author>
          <Author email="">Jens Stoye</Author>
          <Author email="">Alexander Goesmann</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cebitec.uni-bielefeld.de/brf/saruman/saruman.html">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Life Sciences</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bioinformatics, Sequence Alignment, Short read mapping, Bioinformatics workbench, Sequence Analysis,Jochen Blom,Tobias Jakobi,Daniel Doppmeier,saruman@cebitec.uni-bielefeld.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a11c0bd6-6125-4d81-ad46-4f5c320951c7</GUID>
        <Name>Data Assimilation using a GPU Accelerated Path Integral Monte Carlo Approach</Name>
        <ShortDescription>A general approach to data assimilation (state and parameter estimation) in nonlinear dynamical systems with noisy dynamics and noisy measurements. In general terms, it is a method for extracting a few usefull pieces of information from a large amount of raw time series data. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1365_90544_unobsStatesSmall_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1365_90544_unobsStatesSmall_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of California, San Diego</OrganizationName>
        <OrganizationURL>http://physics.ucsd.edu/</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>04/05/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>300</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="jquinn@ucsd.edu">John C. Quinn</Author>
           <Author email="habarbanel@ucsd.edu">Henry D.I. Abarbanel</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://physics.ucsd.edu/~jquinn/GPU-PIMC/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Data Assimilation, Parameter Estimation, Monte Carlo, Path Integral,John C. Quinn,Henry D.I. Abarbanel,jquinn@ucsd.edu,habarbanel@ucsd.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>cf146928-c319-4ea3-97ee-6f24e3b80847</GUID>
        <Name>CP_select</Name>
        <ShortDescription>parallel selection algorithm for GPUs: calculation of the median and order statistics</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1364_8595_median_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1364_8595_median_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Deakin University</OrganizationName>
        <OrganizationURL>http://www.deakin.edu.au/</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>04/10/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>6</SpeedUp>
        <SoftwareLicenseType>Open Source</SoftwareLicenseType>
        <Authors>
           <Author email="gleb@deakin.edu.au">Gleb Beliakov</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.deakin.edu.au/~gleb/cp_select.html">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Libraries</ApplicationType>
           <ApplicationType>Programming Tools</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>selection, median, sorting,Gleb Beliakov,gleb@deakin.edu.au</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5e4ba313-241d-4dc3-8146-66ddb7379614</GUID>
        <Name>Mesh-particle interpolations on GPUs and multicore CPUs</Name>
        <ShortDescription>Particle-mesh interpolations are fundamental operations for particle-in-cell codes, as implemented in vortex methods, plasma dynamics and electrostatics simulations. In these simulations, the mesh is used to solve the field equations and the gradients of the fields are used in order to advance the particles. The time integration of particle trajectories is performed through an extensive resampling of the flow field at the particle locations. The computational performance of this resampling turns out to be limited by the memory bandwidth of the underlying computer architecture. We investigate how mesh-particle interpolation can be efficiently performed on graphics processing units (GPUs) and multicore central processing units (CPUs), and we present two implementation techniques.</ShortDescription>
        <URL>http://rsta.royalsocietypublishing.org/content/369/1944/2164.abstract</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1363_259257_cyl-re40000_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1363_259257_cyl-re40000_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>CSE Lab, ETH Zurich</OrganizationName>
        <OrganizationURL>www.cse-lab.ethz.ch</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>06/01/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Diego Rossinelli</Author>
           <Author email="">Christian Conti</Author>
           <Author email="petros@inf.ethz.ch">Petros Koumoutsakos</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://rsta.royalsocietypublishing.org/content/369/1944/2164.abstract">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Game Physics</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>CPU, GPU, HPC, mesh-particle, grid-particle,Diego Rossinelli,Christian Conti,Petros Koumoutsakos,petros@inf.ethz.ch</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9abc758e-c89b-4761-bfcc-57c36df7e9a1</GUID>
        <Name>GPU-computing in econophysics and statistical physics</Name>
        <ShortDescription>A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today's GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. </ShortDescription>
        <URL>http://dx.doi.org/10.1140/epjst/e2011-01398-x</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1362_5681_econophysics_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1362_5681_econophysics_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>ETH Zurich</OrganizationName>
        <OrganizationURL>http://www.tobiaspreis.de/</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>04/07/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="mail@tobiaspreis.de">Tobias Preis</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://dx.doi.org/10.1140/epjst/e2011-01398-x">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Finance</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Compuational Finance, Computational Physics,Tobias Preis,mail@tobiaspreis.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>aa2667f4-078f-444b-b136-f0065d5c014e</GUID>
        <Name>Processing and rendering of Fourier domain optical coherence tomography images at a line rate over 524 kHz using a graphics processing unit</Name>
        <ShortDescription>In Fourier domain optical coherence tomography (FD-OCT), a large amount of interference data needs to be resampled from the wavelength domain to the wavenumber domain prior to Fourier transformation. We present an approach to optimize this data processing, using a graphics processing unit (GPU) and parallel processing algorithms. We demonstrate an increased processing and rendering rate over that previously reported by using GPU paged memory to render data in the GPU rather than copying back to the CPU.</ShortDescription>
        <URL>http://spie.org/x648.html?product_id=896535</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1361_367646_Screenshot-a_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1361_367646_Screenshot-a_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Aston University, Queen Mary Univ of London, NPL</OrganizationName>
        <OrganizationURL>www.aston.ac.uk</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>02/01/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="raskanj@aston.ac.uk">Janarthanan Rasakanthan</Author>
           <Author email="">Kate Sugden</Author>
           <Author email="">Peter H. Tomlins</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://spie.org/x648.html?product_id=896535">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Optical Coherence Tomography, OCT, medical Imaging,Janarthanan Rasakanthan,Kate Sugden,Peter H. Tomlins,raskanj@aston.ac.uk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>13893c5b-60cd-468f-b8ec-ea6c11e367c2</GUID>
        <Name>Multicore/Multi-GPU Accelerated Simulations of Multiphase Compressible Flows Using Wavelet Adapted Grids</Name>
        <ShortDescription>We present a computational method of coupling average interpolating wavelets with high-order finite volume schemes and its implementation on heterogeneous computer architectures for the simulation of multiphase compressible flows. The method is implemented to take advantage of the parallel computing capabilities of emerging heterogeneous multicore/multi-GPU architectures. </ShortDescription>
        <URL>http://epubs.siam.org/sisc/resource/1/sjoce3/v33/i2/p512_s1</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1360_263363_application-image_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1360_263363_application-image_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>ETH Zurich</OrganizationName>
        <OrganizationURL>www.cse-lab.ethz.ch</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>03/01/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="diegor@inf.ethz.ch">Diego Rossinelli</Author>
           <Author email="">Babak Hejazialhosseini</Author>
           <Author email="">Daniele G. Spampinato</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://epubs.siam.org/sisc/resource/1/sjoce3/v33/i2/p512_s1">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>GPU, compressible flow, wavelets, multiresolution, adaptive grid, multiphase, multicore architectures, OpenCL,Diego Rossinelli,Babak Hejazialhosseini,Daniele G. Spampinato,diegor@inf.ethz.ch</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>089e53a3-5369-4c7c-b9ff-20d04b833618</GUID>
        <Name>Real-time numerical dispersion compensation using graphics processing unit for Fourier-domain optical coherence tomography</Name>
        <ShortDescription>Numerical dispersion compensation for both standard and full-range Fourier-domain optical coherence tomography (FD-OCT) on the graphics processing unit (GPU) architecture has been implemented. The data acquisition, processing and image display were performed on a multi-thread, CPU-GPU heterogeneous computing system. The real-time ultra-high-resolution full-range complex-conjugate-free FD-OCT imaging was demonstrated at 68.4 frame/s with a frame size of 1024 (lateral) by 2048 (axial) pixels.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1359_20606_dispersion_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1359_20606_dispersion_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Johns Hopkins University</OrganizationName>
        <OrganizationURL>http://www.ece.jhu.edu/photonics/zhangkang.html</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>03/03/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="kzhang8@jhu.edu">Kang Zhang</Author>
           <Author email="">Jin U. Kang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5724136">Multimedia</ContentType>
           <ContentType url="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5724136">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Medical Imaging</ApplicationType>
           <ApplicationType>Life Sciences</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>GPU, Numerical Dispersion Compensation, Optical coherence tomography,Kang Zhang,Jin U. Kang,kzhang8@jhu.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fa369b09-c534-4b01-b7b8-28ee8d433c77</GUID>
        <Name>Real-time intraoperative 4D full-range FD-OCT based on the dual graphics processing units architecture for microsurgery guidance</Name>
        <ShortDescription>Real-time 4D full-range complex-conjugate-free Fourier-domain optical coherence tomography (FD-OCT) is implemented using a dual graphics processing units (dual-GPUs) architecture. One GPU is dedicated to the FD-OCT data processing while the second one is used for the volume rendering and display. GPU accelerated non-uniform fast Fourier transform (NUFFT) is also implemented to suppress the side lobes of the point spread function to improve the image quality. </ShortDescription>
        <URL>http://www.opticsinfobase.org/abstract.cfm?uri=boe-2-4-764</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1358_40008_microsurgery_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1358_40008_microsurgery_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Johns Hopkins University</OrganizationName>
        <OrganizationURL>http://www.ece.jhu.edu/photonics/zhangkang.html</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>03/01/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="kzhang8@jhu.edu">Kang Zhang</Author>
           <Author email="">Jin U. Kang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.opticsinfobase.org/abstract.cfm?uri=boe-2-4-764">Multimedia</ContentType>
           <ContentType url="http://www.opticsinfobase.org/abstract.cfm?uri=boe-2-4-764">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Medical Imaging</ApplicationType>
           <ApplicationType>Life Sciences</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>GPU, Optical coherence tomography , 4D imaging,Kang Zhang,Jin U. Kang,kzhang8@jhu.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>93912935-c2cc-4303-b5b2-7355ddff8c8e</GUID>
        <Name>IGMAS+</Name>
        <ShortDescription>Three-dimensional (3D) interactive modeling with the IGMAS software provides the means for integrated processing and interpretation of geoid, gravity and magnetic fields, yielding improved geological interpretation. IGMAS 3D models are constructed using triangulated polyhedra to which constant density and/or induced and remnant susceptibility are assigned. </ShortDescription>
        <URL>http://www.potentialgs.com/</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1357_790835_IGMAS_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1357_790835_IGMAS_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Transinsight GmbH</OrganizationName>
        <OrganizationURL>http://transinsight.com/</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>02/01/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>300</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="info@potentialgs.com">Transinsight GmbH, Christan-Albrecht-Universitat zu Kiel - Department for Geophysics &amp; Geoinformation</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.potentialgs.com/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Oil &amp; Gas</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>interactive modelling, gravity, magnetic, seismic, inversion, numerical modelling, OpenCL,Transinsight GmbH, Christan-Albrecht-Universitat zu Kiel - Department for Geophysics &amp; Geoinformation,info@potentialgs.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fb111a83-f81b-4fb8-8adf-b15c8e375ad4</GUID>
        <Name>Directionally Unsplit Hydrodynamic Schemes with Hybrid MPI/OpenMP/GPU Parallelization in AMR</Name>
        <ShortDescription>We present the implementation and performance of a class of directionally unsplit Riemann-solver-based hydrodynamic schemes on Graphic Processing Units (GPU). These schemes, including the MUSCL-Hancock method, a variant of the MUSCL-Hancock method, and the corner-transport-upwind method, are embedded into the adaptive-mesh-refinement (AMR) code GAMER. Furthermore, a hybrid MPI/OpenMP model is investigated, which enables the full exploitation of the computing power in a heterogeneous CPU/GPU cluster and significantly improves the overall performance. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1356_1516457_KH_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1356_1516457_KH_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>National Taiwan University, Department of Physics</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>03/22/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>101</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="b88202011@ntu.edu.tw">Hsi-Yu Schive</Author>
           <Author email="">Ui-Han Zhang</Author>
           <Author email="">Tzihong Chiueh</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://arxiv.org/abs/1103.3373">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>hybrid MPI/OpenMP/GPU, AMR,Hsi-Yu Schive,Ui-Han Zhang,Tzihong Chiueh,b88202011@ntu.edu.tw</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e47a37df-6e16-41c8-b775-a88790713add</GUID>
        <Name>Horizon MHD</Name>
        <ShortDescription>General relativistic magnetohydrodynamics code. Used in computational astrophysics applications, particular the prediction of gravitational radiation from compact objects, and the dynamics of magnetars.</ShortDescription>
        <URL>http://www.horizoncode.org/</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1355_172283_orszag_tang_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1355_172283_orszag_tang_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Tuebingen, Institute for Astronomy and Astrophysics</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>02/25/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>200</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="bzink@tat.uni-tuebingen.de">Burkhard Zink</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.vimeo.com/20006885">Multimedia</ContentType>
           <ContentType url="http://www.horizoncode.org/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>mhd astrophysics simulator relativistic,Burkhard Zink,bzink@tat.uni-tuebingen.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d0c8b7d9-8adf-4aa6-82aa-4dc5d4c7a070</GUID>
        <Name>Practical Time Bundle Adjustment for 3D Reconstruction on GPUt</Name>
        <ShortDescription>We present a hybrid implementation of sparse bundle adjustment on the GPU using CUDA, with the CPU working in parallel. The algorithm is decomposed into smaller steps, each of which is scheduled on the GPU or the CPU. We develop efficient kernels for the steps and make use of existing libraries for several steps. Our implementation outperforms the CPU implementation significantly, achieving a speedup of 30-40 times over the standard CPU implementation for datasets with upto 500 images on an Nvidia Tesla C2050 GPU</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1354_45129_CPU-GPU-Hybrid3_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1354_45129_CPU-GPU-Hybrid3_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>IIIT Hyderabad</OrganizationName>
        <OrganizationURL>www.iiit.ac.in</OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>01/01/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>40</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="siddharth.choudhary@research.iiit.ac.in">Siddharth Choudhary</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://researchweb.iiit.ac.in/~siddharth.choudhary/CVGPUFinal.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Siddharth Choudhary,siddharth.choudhary@research.iiit.ac.in</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5064a523-1a53-4c6e-9be0-49b519753279</GUID>
        <Name>Flow dynamics measurements using digital holographic PIV</Name>
        <ShortDescription>An in-line digital holographic (D-HPIV) setup and CUDA-accelerated algorithm were implemented in order to measure the instantaneous three-dimensional (3D), three-component (3C) velocity field of nonstationary flows. This increases dramatically the speed of digital video hologram processing. The system can measure the number, 3D position, size, 3C velocity and track of the particles. The results of the hologram reconstruction are represented using OpenGL.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1353_145946_Figure3_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1353_145946_Figure3_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Petrozavodsk State University</OrganizationName>
        <OrganizationURL>www.petrsu.ru</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>10/07/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>1000</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="edmitr@onego.ru">Dmitry Ekimov</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://dims.karelia.ru/dihm/index.php?&amp;plang=e">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Dmitry Ekimov,edmitr@onego.ru</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c3c135bf-3665-438c-b634-25ee54e81a90</GUID>
        <Name>Numerical simulation of flow around an oscillating cylinder</Name>
        <ShortDescription>This program presents a finite difference solution for 2D, low Reynolds number (1-350), unsteady flow around and heat transfer from a stationary or oscillating circular cylinder with constant surface temperature and placed in a uniform stream. The fluid is assumed to be incompressible and of constant property. The cylinder is moved mechanically and can vibrate in-line with or transverse to the main stream or can follow an elliptical or figure-8-shaped path. The governing equations are the Navier-Stokes equations, the continuity equation, a Poisson equation for pressure and the energy equation.</ShortDescription>
        <URL>http://www.filefactory.com/file/cac6d70/n/FlowCFD.zip </URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1352_24973_NVIDIA_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1352_24973_NVIDIA_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Department of Fluid and Heat Engineering, University of Miskolc, Hungary</OrganizationName>
        <OrganizationURL>www.uni-miskolc.hu</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>02/02/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>13</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="arambl@uni-miskolc.hu; daroczy4@freemail.hu ">Prof. Laszlo Baranyi, Laszlo Daroczy</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.youtube.com/watch?v=UOlV7accjYM">Multimedia</ContentType>
           <ContentType url="http://www.filefactory.com/file/cac6d70/n/FlowCFD.zip ">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType> Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>CFD, numerics, Computational Fluid Dynamics, oscillating cylinder, in-line oscillation, transverse oscillation, Figure-8-shape motion, SOR, successive over-relaxation, heat transfer, 2D, Reynolds number, Strouhal number, Nusselt number, incompressible, lift, drag, Poisson equation, Navier-Stokes equations, temperature,Prof. Laszlo Baranyi, Laszlo Daroczy,arambl@uni-miskolc.hu; daroczy4@freemail.hu </Keyword>
        </Keywords>
     </Application>


<Application>
        <GUID>acbbd15e-82f0-45e6-988a-f1726e4bb1ce</GUID>
        <Name>Running the High Performance Linpack (HPL) Benchmark on NVIDIA GPUs</Name>
        <ShortDescription>The HPL benchmark is used to rank the world's Top500 supercomputers. This is a step by step procedure on how to run NVIDIA's version of the HPL benchmark on Tesla GPUs. We also compare the results of a normal HPL run on CPU to a hybird run on CPU-GPU to show the performance boost gained with GPUs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1349_logo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1349_logo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Saudi Aramco</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>01/10/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="sindimo@ieee.org">Mohamad Sindi</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://hpl-calculator.sourceforge.net/Howto-HPL-GPU.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Benchmark</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>NVIDIA, Linpack, HPL, GPU, Top500, FLOPS, High Performance Computing, HPC,Mohamad Sindi,sindimo@ieee.org</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>be475639-1007-4bf8-bcc8-b103379fdf9d</GUID>
        <Name>GPU Vision: Accelerating Computer Vision algorithms with Graphics Processing Units</Name>
        <ShortDescription>We present an introduction to the eld of GPU accelerated computer vision by examining several projects that provide the framework for researchers and developers to tap into the computational power of Graphics Processing Units (GPU). Our goal is to identify the tools and areas where GPU acceleration can provide the highest performance increases in computer vision applications by creating performance benchmarks to compare and contrast the GPU and CPU versions in realistic applications. </ShortDescription>
        <URL>http://c13software.com/downloads/GPUVision_2011.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1347_133852_haar_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1347_133852_haar_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Connecticut</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>09</ReleaseDay>
        <ReleaseDateDisplay>02/09/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="tamas.k.lengyel@gmail.com">Tamas K. Lengyel</Author>
           <Author email="james.gedarovich@gmail.com">James Gedarovich</Author>
           <Author email="antonio.cusano@gmail.com">Antonio Cusano</Author>
           <Author email="tpeters@engr.uconn.edu">Thomas J. Peters</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://c13software.com/downloads/GPUVision_2011.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Tamas K. Lengyel,James Gedarovich,Antonio Cusano,tamas.k.lengyel@gmail.com,james.gedarovich@gmail.com,antonio.cusano@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1deb8e36-96ab-486b-a05e-0af7a067b6b7</GUID>
        <Name>CUDA Image Mosaic</Name>
        <ShortDescription>Creates image mosaics from a database of thumbnails on a pixel by pixel basis using CUDA to perform the image comparisons.</ShortDescription>
        <URL>Digital Content Creation,Graphics</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1346_601593_cuda800_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1346_601593_cuda800_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Personal</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>02/11/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>100</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="andyhcoates@gmail.com">Andy H Coates</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.youtube.com/watch?v=k5rdvW2a4NA">Multimedia</ContentType>
           <ContentType url="http://www.andyhcoates.com/files/cudamosaic.rar">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Photo Mosaic Image PhotoMosaic,Andy H Coates,andyhcoates@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>50827283-e4ee-4c05-bc3d-27bd9df0436b</GUID>
        <Name>CUDA </Name>
        <ShortDescription>Real-time renderer</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1345_62423_chessRefraction_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1345_62423_chessRefraction_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Freelancer</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>02/05/2011</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>20</SpeedUp>
        <SoftwareLicenseType>Open Source</SoftwareLicenseType>
        <Authors>
           <Author email="ttsiodras@gmail.com">Thanassis Tsiodras</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.youtube.com/watch?v=1o8HM11h8fc">Multimedia</ContentType>
           <ContentType url="http://users.softlab.ece.ntua.gr/~ttsiod/cudarenderer-BVH.html">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Ray Tracing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>SAH AABB BVH Triangle-meshes real-time raytracer,Thanassis Tsiodras,ttsiodras@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7717e876-57cb-49db-9350-0ae7cc77ac63</GUID>
        <Name>Multi-GPU accelerated multi-spin Monte Carlo simulations of the 2D Ising model</Name>
        <ShortDescription>A Modern Graphics Processing unit is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two dimensional Ising model T. Preis et al., Journal of Computational Physics 228 2009 4468 4477 in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit core implementation which employs multispin coding. </ShortDescription>
        <URL>www.tobiaspreis.de/publications/bvp_cpc_2010.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1344_11409_preis_multi_gpu_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1344_11409_preis_multi_gpu_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Johannes Gutenberg University Mainz</OrganizationName>
        <OrganizationURL>www.tobiaspreis.de</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>08/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>35</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="mail@tobiaspreis.de">Benjamin Block</Author>
           <Author email="">Peter Virnau</Author>
           <Author email="">Tobias Preis</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="www.tobiaspreis.de/publications/bvp_cpc_2010.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Computational Physics, Monte Carlo Simulation, GPU Clusters,Benjamin Block,Peter Virnau,Tobias Preis,mail@tobiaspreis.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>96a2f87d-895b-4b9f-b04e-7ceb30d28941</GUID>
        <Name>Hex Protein Docking</Name>
        <ShortDescription>Modelling protein-protein interactions (PPIs) is an important aspect of structural bioinformatics. The Hex spherical polar Fourier protein docking algorithm has been implemented on Nvidia graphics processor units (GPUs). On a GTX 285 GPU, an exhaustive six-dimensional docking search can be calculated in just 15 seconds using multiple one-dimensional fast Fourier transforms. This represents a 45-fold speed-up over the corresponding calculation on a single CPU, and is at least two orders of magnitude faster than conventional Cartesian grid-based FFT docking approaches.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1342_47767_hex_3hfl_docked_rainbow_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1342_47767_hex_3hfl_docked_rainbow_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>INRIA</OrganizationName>
        <OrganizationURL>http://www.inria.fr</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>04/24/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>45</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="dave.ritchie@inria.fr">Dave Ritchie</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://hex.loria.fr">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Life Sciences</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>protein docking,Dave Ritchie,dave.ritchie@inria.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fe5a34d7-6844-4b87-8542-7ef5ce307b1c</GUID>
        <Name>GPU-accelerated molecular dynamics simulation for study of liquidcrystalline flows</Name>
        <ShortDescription>We have developed a GPU-based molecular dynamics simulation for the study of flows of fluids with anisotropic molecules such as liquid crystals. An application of the simulation to the study of macroscopic flow (backflow) generation by molecular reorientation in a nematic liquid crystal under the application of an electric field is presented. The computations of intermolecular force and torque are parallelized on the GPU using the cell-list method, and an efficient algorithm to update the cell lists was proposed. </ShortDescription>
        <URL>http://portal.acm.org/citation.cfm?id=1808870</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1340_header_r1_c1_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1340_header_r1_c1_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Kochi University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>08/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>50</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="sunarso@kochi-tech.ac.jp">Alfeus Sunarso </Author>
           <Author email="">Tomohiro Tsuji</Author>
           <Author email="">Shigeomi Chono</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://portal.acm.org/citation.cfm?id=1808870">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alfeus Sunarso ,Tomohiro Tsuji,Shigeomi Chono,sunarso@kochi-tech.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4672b45f-125b-480f-a8c6-f5aa647b2a75</GUID>
        <Name>MandelCUDA</Name>
        <ShortDescription>Real-time rendering of the Mandelbrot fractal</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1338_43133_mandel_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1338_43133_mandel_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Freelancer</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>03/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>40</SpeedUp>
        <SoftwareLicenseType>Open Source</SoftwareLicenseType>
        <Authors>
           <Author email="ttsiodras@gmail.com">Thanassis Tsiodras</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://users.softlab.ntua.gr/~ttsiod/mandelSSE.html">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Application,Code,Thanassis Tsiodras,ttsiodras@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>adcd077f-17fb-44e4-91de-cb028f7fe788</GUID>
        <Name>CUDA Accelerated Particle Engine</Name>
        <ShortDescription>A simple point sprite based particle engine accelerated with CUDA.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1337_309888_particles (1)_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1337_309888_particles (1)_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Student</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>05/24/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="mouser58907@yahoo.com">Craig Mouser</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.youtube.com/watch?v=WMB6ah-cKW4">Multimedia</ContentType>
           <ContentType url="http://www.craigmouser.com/random/cudaparticles.zip">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Digital Content Creation</ApplicationType>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Video</ApplicationType>
           <ApplicationType> Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Craig Mouser,mouser58907@yahoo.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8a9de003-fdec-4da9-b81e-d3c7a991d982</GUID>
        <Name>Parallel Option Pricing on GPU: Barrier Options and Realized Variance Options</Name>
        <ShortDescription>We present parallel algorithms implemented in CUDA subroutines ready to run on Graphics Processing Units (GPUs) to price two kinds of financial derivatives, that is: continuous barrier options and realized variance options. The outstanding parallel performance of these algorithms when executed on GPUs is due to the mathematical properties of the pricing formulae used and to their software implementation.</ShortDescription>
        <URL>http://www.econ.univpm.it/recchioni/finance/w13</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1336_Fig2GPU_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1336_Fig2GPU_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universita di Camerino, Universita Politecnica delle Marche, Universita di Roma La Sapienza </OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>11/05/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="lorella.fatone@unicam.it ">L. Fatone</Author>
           <Author email="m.giacinti@univpm.it ">M. Giacinti</Author>
           <Author email="fra_mariani@libero.it">F. Mariani</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.econ.univpm.it/recchioni/finance/w13">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Finance</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>L. Fatone,M. Giacinti,F. Mariani,lorella.fatone@unicam.it ,m.giacinti@univpm.it ,fra_mariani@libero.it</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>38d09cb5-3ff4-431c-a705-23b70259a7c1</GUID>
        <Name>Graphics processing unit accelerated non-uniform fast Fourier transform for ultrahigh-speed, real-time Fourier-domain OCT</Name>
        <ShortDescription>We implemented fast Gaussian gridding (FGG)-based non-uniform fast Fourier transform (NUFFT) on the graphics processing unit (GPU) architecture for ultrahigh-speed, real-time Fourier-domain optical coherence tomography (FD-OCT). The Vandermonde matrix-based non-uniform discrete Fourier transform (NUDFT) as well as the linear/cubic interpolation with fast Fourier transform (InFFT) methods are also implemented on GPU to compare their performance in terms of image quality and processing speed.</ShortDescription>
        <URL>http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-18-22-23472</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1335_165404_finger_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1335_165404_finger_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Johns Hopkins University</OrganizationName>
        <OrganizationURL>www.ece.jhu.edu</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>10/25/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="kzhang8@jhu.edu">Kang Zhang</Author>
           <Author email="">Jin U. Kang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-18-22-23472">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Medical Imaging</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Kang Zhang,Jin U. Kang,kzhang8@jhu.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ba0d2d88-8e95-486f-87c7-2e084e8bcc35</GUID>
        <Name>Real-time 4D signal processing and visualization using graphics processing unit on a regular nonlinear-k Fourier-domain OCT system</Name>
        <ShortDescription>We realized graphics processing unit (GPU) based real-time 4D (3D + time) signal processing and visualization on a regular Fourier-domain optical coherence tomography (FD-OCT) system with a nonlinear k-space spectrometer. An ultra-high speed linear spline interpolation (LSI) method for -to-k spectral re-sampling is implemented in the GPU architecture, which gives average interpolation speeds of >3,000,000 line/s for 1024-pixel OCT (1024-OCT) and >1,400,000 line/s for 2048-pixel OCT (2048-OCT).</ShortDescription>
        <URL>http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-18-11-11772</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1333_61749_finger tip singles 2_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1333_61749_finger tip singles 2_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Johns Hopkins University</OrganizationName>
        <OrganizationURL>www.ece.jhu.edu</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>18</ReleaseDay>
        <ReleaseDateDisplay>05/18/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="kzhang8@jhu.edu">Kang Zhang</Author>
           <Author email="">Jin U. Kang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-18-11-11772">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Medical Imaging</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Real-time 4D Optical coherence tomography,Kang Zhang,Jin U. Kang,kzhang8@jhu.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4ee10503-6f9a-4136-858c-5df8aa4cf07f</GUID>
        <Name>GPU Smoldyn</Name>
        <ShortDescription>Porting to CUDA of the core simulation algorithms of Smoldyn</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1332_167342_screenshot_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1332_167342_screenshot_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>COSBI</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>12/10/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>130</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="dematte@ieee.org">Lorenzo Dematte</Author>
           <Author email="">Davide Prandi</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="https://sourceforge.net/projects/gpusmoldyn/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Life Sciences</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Lorenzo Dematte,Davide Prandi,dematte@ieee.org</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>6933fa46-7fc4-4053-b724-feda8995296b</GUID>
        <Name>MC-GPU: Monte Carlo Simulation of X-ray Transport for Medical Imaging Applications</Name>
        <ShortDescription>MC-GPU is a GPU-accelerated x-ray transport simulation code that can generate clinically-realistic radiographic projection images and computed tomography (CT) scans of the human anatomy. MC-GPU implements a massively multi-threaded Monte Carlo simulation algorithm for the transport of x rays in a voxelized geometry and uses the x-ray interaction models and cross sections from PENELOPE 2006. The code can handle realistic human anatomy phantoms, for example the freely available models from the Virtual Family. Electron transport is not implemented. The code has been developed using the CUDA programming model and MPI to address multiple GPUs in parallel. In typical diagnostic imaging simulations, a 15 to 30-fold speed up is obtained using a GPU compared to a CPU execution. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1331_99177_mc-gpu_1mmDuke_50keV_1e10hist__All_and_NoScatter_LowRes_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1331_99177_mc-gpu_1mmDuke_50keV_1e10hist__All_and_NoScatter_LowRes_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>US Food and Drug Administration</OrganizationName>
        <OrganizationURL>http://www.fda.gov/MedicalDevices/ScienceandResearch/ucm2007489.htm</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>07/08/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>30</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="andreu_badal@hotmail.com">Andreu Badal</Author>
           <Author email="">Aldo Badano</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/mcgpu/">Paper</ContentType>
           <ContentType url="http://code.google.com/p/mcgpu/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
           <ApplicationType>Ray Tracing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Andreu Badal,Aldo Badano,andreu_badal@hotmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>59b21188-3fc1-448c-bee4-6c5923cfcd67</GUID>
        <Name>A demonstration of Exact String Matching Algorithms in CUDA</Name>
        <ShortDescription>I had a simple idea: is it possible to convert some of the well-known exact string matching algorithms into CUDA versions</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1330_80324_Screen_shot_2011-01-07_at_PM_2012_12_43_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1330_80324_Screen_shot_2011-01-07_at_PM_2012_12_43_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>HP Labs Singapore</OrganizationName>
        <OrganizationURL>http://www.hp.com</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>23</ReleaseDay>
        <ReleaseDateDisplay>12/23/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>100</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="raymondtay1974@gmail.com">Raymond Tay</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/exactstrmatchgpu/">Application</ContentType>
           <ContentType url="http://code.google.com/p/exactstrmatchgpu/">Paper</ContentType>
           <ContentType url="http://code.google.com/p/exactstrmatchgpu/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>general purpose computing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Raymond Tay,raymondtay1974@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f48bb746-7b1b-4da3-b204-cc93a3414cc0</GUID>
        <Name>Poker Simulation In GPU</Name>
        <ShortDescription>Simulation is a widely using technique by artificial and human players for helping the decision process in poker. In a typical texas hold'em game simulating all possible game states requires millions of hand evaluations. In this application, we port the Hand-Eval poker library to CUDA providing a generic interface for evaluations of large amounts of hand data.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1329_6875_resim_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1329_6875_resim_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>METU</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>01/31/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>15</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="volkansirin@gmail.com">Sirin,Volkan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ii.metu.edu.tr/coursewebsites/quda/cuda_zone/report.htm">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Game Simulation</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sirin,Volkan,volkansirin@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3b26e749-9952-4f86-b8f7-4ca377ef9dea</GUID>
        <Name>Satellite Image Processing on GPU</Name>
        <ShortDescription>Satellite Image Processing on GPU is demonstration of performance of remote sensing algorithms such as Shadow Detection and Vegetation Detection on GPU. Also basic image processing algorithms like Contrast Normalization, Histogram Equalization, Automatic Threshold (Otsu's) are implemented. 4 band Satellite Images with 8-bit and 16-bit data are used in tests. Performance of basic and complex algorithms are compared in CPU and GPU with images with various sizes. In the tests, the effect of memory transfer and the order of bands are also considered. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1328_60270_imageGPU_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1328_60270_imageGPU_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Informatics Institute, Middle East Technical University</OrganizationName>
        <OrganizationURL>http://www.vrcv.ii.metu.edu.tr</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>01/31/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="mustafa.teke@gmail.com">Mustafa Teke</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ii.metu.edu.tr/coursewebsites/quda/mteke/">Paper</ContentType>
           <ContentType url="http://www.ii.metu.edu.tr/coursewebsites/quda/mteke/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mustafa Teke,mustafa.teke@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3dcc9409-2154-48c2-8b2d-9f64294ea4a5</GUID>
        <Name>Parallel implementation of large scale crowd simulation</Name>
        <ShortDescription>Human crowd movement was simulated using texture convolution and a behavioral model inspired by smoothed particle hydrodynamics. In order to make large scale simulation possible in real-time, or almost real-time, we will implement a model for human crowd behavior on a parallel processing platform using CUDA</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1327_361460_accumulatorMultiColor_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1327_361460_accumulatorMultiColor_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>DIKU</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>11/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="tag@greenleaf.dk">Thomas Gronnelov</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.greenleaf.dk/tag/downloads/downloadCrowd.html">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Crowd simulation</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Thomas Gronnelov,tag@greenleaf.dk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c9f1c703-120a-4322-a545-025f92f91c95</GUID>
        <Name>Monte Carlo simulation of the q-state Potts Model using CUDA</Name>
        <ShortDescription>In this work we implement a parallel code to perform finite temperature Monte Carlo simulations of a magnetic system described by a two dimensional q-state Potts model. </ShortDescription>
        <URL>http://www.famaf.unc.edu.ar/grupos/GPGPU/Potts/CUDAPotts.html</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1326_82402_potts-nvidia_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1326_82402_potts-nvidia_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>GPGPU Computing Group - Fa.M.A.F. - U.N.C.</OrganizationName>
        <OrganizationURL>http://www.famaf.unc.edu.ar/grupos/GPGPU/</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>01/05/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>155</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="ferrero@famaf.unc.edu.ar">Ezequiel E. Ferrero</Author>
           <Author email="jde@famaf.unc.edu.ar">Juan Pablo De Francesco</Author>
           <Author email="nicolasw@famaf.unc.edu.ar, cannas@famaf.unc.edu.ar">Nicolas Wolovick, Sergio A. Cannas</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.famaf.unc.edu.ar/grupos/GPGPU/Potts/CUDAPotts.html">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Statistical Mechanics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ezequiel E. Ferrero,Juan Pablo De Francesco,Nicolas Wolovick, Sergio A. Cannas,ferrero@famaf.unc.edu.ar,jde@famaf.unc.edu.ar,nicolasw@famaf.unc.edu.ar, cannas@famaf.unc.edu.ar</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ce4b9b35-490f-49f0-b1e7-4d5ec3b9841f</GUID>
        <Name>CoroBot</Name>
        <ShortDescription>CUDA-enabled controller for a mobile robot. The controller takes advantage of an ION board. Machine vision algorithms are accelerated by a minimum of 8x compared to their single-threaded C++ version executed on the ION Atom CPU</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1325_6852_corobot_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1325_6852_corobot_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>RealityFrontier</OrganizationName>
        <OrganizationURL>http://www.realityfrontier.com</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>09/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>8</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="raphael.cariou@realityfrontier.com">Raphael Cariou</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.realityfrontier.com/nvidia-conference-2010-our-demo/">Application</ContentType>
           <ContentType url="http://www.youtube.com/watch?v=iBYWuVYZ7mE">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
           <ApplicationType>Robotics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Raphael Cariou,raphael.cariou@realityfrontier.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9101844c-56f4-40d7-a856-c75fc81e385a</GUID>
        <Name>Simulating spin models on GPU</Name>
        <ShortDescription>Simulations of the Ising, Heisenberg and spin-glass models with Metropolis and parallel tempering updates.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1324_checker_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1324_checker_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Johannes Gutenberg-University Mainz</OrganizationName>
        <OrganizationURL>http://www.uni-mainz.de</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>01/07/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>1000</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="weigel@uni-mainz.de">Martin Weigel</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cond-mat.physik.uni-mainz.de/~weigel/GPU">Paper</ContentType>
           <ContentType url="http://www.cond-mat.physik.uni-mainz.de/~weigel/GPU">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Statistical physics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Martin Weigel,weigel@uni-mainz.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>be82fae2-de7e-48bc-94c2-5f68a86c8c48</GUID>
        <Name>iWormhole Desktop Edition</Name>
        <ShortDescription>Is an ultra-secure file sending Windows Application. This application was designed for consumer use with three guiding principles: 1) Speed, 2) Privacy and 3) Security.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1323_36895_screenshot_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1323_36895_screenshot_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>iWormhole Communications Corp</OrganizationName>
        <OrganizationURL>http://www.iwormhole.com</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>12/12/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>1700</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="rob@iwormhole.com">Rob Gagnon</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.iwormhole.com">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>File Transmission</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Rob Gagnon,rob@iwormhole.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>56b1c6a1-88bd-4f75-9898-730bd21ce344</GUID>
        <Name>Simulation of 1+1 dimensional surface growth andl attices gases using GPUs</Name>
        <ShortDescription>Restricted solid on solid surface growth models can be mapped onto binary lattice gases. We show that efficient simulation algorithms can be realized on GPUs either by CUDA or by OpenCL programming. We consider a deposition evaporation model following Kardar-Parisi-Zhang growth in 1+1 dimensions related to the Asymmetric Simple Exclusion Process and show that for sizes, that fit into the shared memory of GPUs one can achieve the maximum parallelization speedup ( x100 for a Quadro FX 5800 graphics card with respect to a single CPU of 2.67 GHz). This permits us to study the effect of quenched columnar disorder, requiring extremely long simulation times. We compare the CUDA realization with an OpenCL implementation designed for processor clusters via MPI. A two-lane traffic model with randomized turning points is also realized and the dynamical behavior has been investigated.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1322_15738_Model1d_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1322_15738_Model1d_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>MTA-MFA, Res. Inst. for Tech. Phys. and Materials Sci. Budapest</OrganizationName>
        <OrganizationURL>http://www.mfa.kfki.hu</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>12/03/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>100</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="odor@mfa.kfki.hu">Henrik Schulz</Author>
           <Author email="">Geza Odor</Author>
           <Author email="">Gergely Odor, Mate F. Nagy</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://arxiv.org/abs/1012.0385">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Statistical Physics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Henrik Schulz,Geza Odor,Gergely Odor, Mate F. Nagy,odor@mfa.kfki.hu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e795a2dd-798b-4cb4-9630-e8c7cd042a16</GUID>
        <Name>CUVI Lib</Name>
        <ShortDescription>CUDA for Vision and Imaging Library</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1321_5734_logo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1321_5734_logo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>TunaCode</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>08/26/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>40</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="tauseef@tunacode.com">Tauseef Rehman</Author>
           <Author email="">Salman Haq</Author>
           <Author email="">Usman Aziz, Jawad Masood</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cuvilib.com">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
           <ApplicationType>Libraries</ApplicationType>
           <ApplicationType>Programming Tools</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Tauseef Rehman,Salman Haq,Usman Aziz, Jawad Masood,tauseef@tunacode.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c0ae46f3-d256-4c38-9f59-11cddd117c0a</GUID>
        <Name>Interactive visualization of the largest radioastronomy cubes</Name>
        <ShortDescription>Astronomy is a data intensive science. The upcoming and future astronomy research facilities will systematically generate terabyte-sized data sets moving astronomy into the Petascale data era. Such increases in dataset size and dimensionality will pose serious computational challenges for many current astronomy data analysis and visualization tools. </ShortDescription>
        <URL>http://astronomy.swin.edu.au/~ahassan/Research.html</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1320_optiportal_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1320_optiportal_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Swinburne University of Technology-Centre for Astrophysics and Supercomputing</OrganizationName>
        <OrganizationURL>http://astronomy.swin.edu.au/scivis/</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>09/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">A. H. Hassan</Author>
           <Author email="">C. J. Fluke</Author>
           <Author email="">D. G. Barnes</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://astronomy.swin.edu.au/~ahassan/Research.html">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>A. H. Hassan,C. J. Fluke,D. G. Barnes</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f01e4f1c-7947-4609-b24c-4732f954bafb</GUID>
        <Name>Simulation of 1+1 dimensional surface growth andl attices gases using GPUs</Name>
        <ShortDescription>Restricted solid on solid surface growth models can be mapped onto binary lattice gases. We show that efficient simulation algorithms can be realized on GPUs either by CUDA or by OpenCL programming. We consider a deposition/ evaporation model following Kardar-Parisi-Zhang growth in 1+1 dimensions related to the Asymmetric Simple Exclusion Process and show that for sizes, that fit into the shared memory of GPUs one can achieve the maximum parallelization speedup ( x100 for a Quadro FX 5800 graphics card with respect to a single CPU of 2.67 GHz). This permits us to study the effect of quenched columnar disorder, requiring extremely long simulation times. We compare the CUDA realization with an OpenCL implementation designed for processor clusters via MPI. A two-lane traffic model with randomized turning points is also realized and the dynamical behavior has been investigated.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1318_15738_Model1d_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1318_15738_Model1d_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>MTA-MFA, Res. Inst. for Tech. Phys. and Materials Sci. Budapest</OrganizationName>
        <OrganizationURL>http://www.mfa.kfki.hu</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>12/03/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>100</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="odor@mfa.kfki.hu">Henrik Schulz</Author>
           <Author email="">Geza Odor</Author>
           <Author email="">Gergely Odor, Mate F. Nagy</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://arxiv.org/abs/1012.0385">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Statistical Physics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Henrik Schulz,Geza Odor,Gergely Odor, Mate F. Nagy,odor@mfa.kfki.hu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4936a20a-b98f-4e39-94d2-ec174d002e9e</GUID>
        <Name>Nonlinear Free Surface Water Waves</Name>
        <ShortDescription>Fast Desktop Computing for Nonlinear Free Surface Water Waves (OceanWave3D potential flow model)</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1317_47409_whalint3_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1317_47409_whalint3_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Technical University of Denmark</OrganizationName>
        <OrganizationURL>http://www.imm.dtu.dk/~apek</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>12/03/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>42</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="apek@imm.dtu.dk">Allan P. Engsig-Karup</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www2.imm.dtu.dk/~apek/OceanWave3D/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>oceanwave3d, potential free surface flow, finite difference method, coastal engineering,Allan P. Engsig-Karup,apek@imm.dtu.dk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a79d4690-9c30-4837-88bc-805873f7e5f5</GUID>
        <Name>Lagrangian Stochastic Particle Model using Large-Eddy Simulation Meteorology</Name>
        <ShortDescription>Atmospheric transport and dispersion (T D) models play an important roll in United States national defense. Due to operational time constraints, less sophisticated models have consistently dominated the defense market. Recent advances in graphics processing units (GPUs) and their programming models have made GPUs an attractive platform for commodity, low-power, high-performance parallel computing. Two GPU accelerated (using NVIDIA Corporation's CUDA technology) versions of a sophisticated, large-eddy simulation (LES) based, Lagrangian stochastic model, developed at the National Center for Atmospheric Research (NCAR), were implemented and compared against their single and multiple core CPU (Intel Harpertown) counterparts. The implementation representing the shortest route to GPU acceleration observed a single GPU speedup of 14x over the single core CPU implementation. A more robust and scalable single GPU implementation observed speedups of 20x over the single core CPU implementation.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1316_27146_ave_plan_view_crop_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1316_27146_ave_plan_view_crop_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Colorado - Boulder</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>13</ReleaseDay>
        <ReleaseDateDisplay>07/13/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>20</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="jhurst@ucar.edu">Jonathan Hurst</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://pqdtopen.proquest.com/#abstract?dispub=1481219">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jonathan Hurst,jhurst@ucar.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ebb0619e-dd44-4c10-ac6c-1659ff388b6f</GUID>
        <Name>rCUDA 2.0</Name>
        <ShortDescription>Allows performing CUDA calls to remote GPUs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1315_4691_rCUDA_logo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1315_4691_rCUDA_logo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>UPV / UJI</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>11/24/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="apenya@gap.upv.es">The rCUDA Team</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.gap.upv.es/rCUDA">Application</ContentType>
           <ContentType url="http://www.gap.upv.es/rCUDA">Paper</ContentType>
           <ContentType url="http://www.gap.upv.es/rCUDA">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Libraries</ApplicationType>
           <ApplicationType>Programming Tools</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>The rCUDA Team,apenya@gap.upv.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>bab91959-3b2a-494c-8644-cf771a2f9bc0</GUID>
        <Name>LATTE</Name>
        <ShortDescription>GPU-accelerated self-consistent tight-binding molecular dynamics for materials with mixed covalent and ionic bonding.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1314_main_orig_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1314_main_orig_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Los Alamos National Laboratory</OrganizationName>
        <OrganizationURL>http://www.lanl.gov</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>11/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="edsanville@gmail.com">E.J. Sanville</Author>
           <Author email="nbock@lanl.gov">N. Bock</Author>
<Author email="amn@lanl.gov">A. M. N. Niklasson</Author>
<Author email="aodell@kth.se">A. Odell</Author>
<Author email="srudin@lanl.gov">S. Rudin</Author>
<Author email="cawkwell@lanl.gov">M. J. Cawkwell</Author>
           <Author email="jcoe@lanl.gov">J. Coe</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://savannah.nongnu.org/projects/latte">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Life Sciences</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>E.J. Sanville,N. Bock,J. Coe, A. M. N. Niklasson, A. Odell, S. Rudin, M. J. Cawkwell,edsanville@gmail.com,nbock@lanl.gov, jcoe@lanl.gov, amn@lanl.gov, aodell@kth.se, srudin@lanl.gov, cawkwell@lanl.gov</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4dfaf12b-06c9-418f-8250-7d75fa91a932</GUID>
        <Name>Reverse extraction of early-age hydration kinetic equation from observed data of Portland cement.</Name>
        <ShortDescription>The early-age hydration of Portland cement paste has an important impact on the formation of microstructure and development of strength. However, manual derivation of hydration kinetic equation is very difficult because there are multi-phased, multi-sized and interrelated complex chemical and physical reactions during cement hydration. In this paper, early-age hydration kinetic equation is reversely extracted automatically from the observed time series of hydration degree of Portland cement using evolutionary computation method that combines gene expression programming and particle swarm optimization algorithms. In order to reduce the computing time, GPUs are used for acceleration in parallel. Studies have shown that according to the extracted kinetic equation, simulation curve of early-age hydration is in good accordance with the observed experimental data. Furthermore, this equation still has a good generalization ability even changing chemical composition, particle size and curing conditions.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1313_75384_Reverse_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1313_75384_Reverse_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Provincial Key Laboratory for Network-based Intelligent Computing, University of Jinan, Jinan 250022, China</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>11/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="wangplanet@gmail.com">WANG Lin</Author>
           <Author email="">YANG Bo</Author>
           <Author email="">ZHAO XiuYang, CHEN YueHui, CHANG Jun</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/w4115j3520183755/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Material</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>WANG Lin,YANG Bo,ZHAO XiuYang, CHEN YueHui, CHANG Jun,wangplanet@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f905568d-b7d2-4ac5-b102-89b8f9a8cbdc</GUID>
        <Name>IntelliEtch GPU module</Name>
        <ShortDescription>IntelliEtch is an Anisotropic Wet Etch simulator. This chemical process can be used for Silicon-based Microsystems fabrication. IntelliEtch can be used as a CAD tool for Microsystem fabrication, allowing fast and accurate simulations.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1312_87164_Images-036_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1312_87164_Images-036_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>I3M Institute(Polytechnic University of Valencia), DIPC Intitute (University of the Basque Country)</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>10/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>150</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="nesferjo@upvnet.upv.es">N Ferrando</Author>
           <Author email="miguelangel.gosalvez@ehu.es">M A Gosalvez</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.intellisensesoftware.com/modules/IntelliEtch.html">Multimedia</ContentType>
           <ContentType url="http://www.intellisensesoftware.com/modules/IntelliEtch.html">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
           <ApplicationType>Microsystems</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>MEMS, microsystems, cellular automata,N Ferrando,M A Gosalvez,nesferjo@upvnet.upv.es,miguelangel.gosalvez@ehu.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a63d80ac-a9bb-4cb9-a632-66d9d2563f07</GUID>
        <Name>Ultra Fast SOM using CUDA</Name>
        <ShortDescription>This paper presents an overall idea of the optimization strategies used for the parallel implementation of Basic-SOM on GPU using CUDA programming paradigm.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1311_NeST-NVIDIA_Center_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1311_NeST-NVIDIA_Center_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>NeST</OrganizationName>
        <OrganizationURL>http://nestsoftware.com/</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>18</ReleaseDay>
        <ReleaseDateDisplay>05/18/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="hpc@nestgroup.net">Sijo Mathew</Author>
           <Author email="">Preetha Joy</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://nestsoftware.com/nest/whitepapers/Ultra_Fast_SOM_using_CUDA.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Data mining</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sijo Mathew,Preetha Joy,hpc@nestgroup.net</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>30fbce15-63ea-41fe-8ff3-665316b591e3</GUID>
        <Name>AgiSoft PhotoScan</Name>
        <ShortDescription>AgiSoft PhotoScan is an advanced image-based 3D modeling solution for creating professional quality 3D content from still images. Based on the latest multi-view 3D reconstruction technology, it operates on arbitrary images and is efficient in both controlled and uncontrolled conditions. The photos can be taken from any positions, providing that an object to be reconstructed is visible on at least two photos. Both image alignment and 3D model reconstruction is fully automated.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1309_436476_logo-pscan-2_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1309_436476_logo-pscan-2_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>AgiSoft</OrganizationName>
        <OrganizationURL>http://www.agisoft.ru</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>18</ReleaseDay>
        <ReleaseDateDisplay>08/18/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>20</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="info@agisoft.ru">AgiSoft</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.agisoft.ru/products/photoscan/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
           <ApplicationType>Digital Content Creation</ApplicationType>
           <ApplicationType>Graphics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>image based modeling,AgiSoft,info@agisoft.ru</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>46c6675d-b59a-49c8-89dd-fc7a32e6484e</GUID>
        <Name>Field Forge</Name>
        <ShortDescription>Field Forge brings massively parallel processing (MPP) to PostgreSQL's current single-threaded sessions. Field Forge utilizes the MPP power of the Kappa framework. The Kappa framework provides practical usage of CUDA GPU, OpenMP, and partitioned data flow scheduled processing. Field Forge make the Kappa framework from Psi Lambda LLC a new Language for defining Window and Table functions. These functions allow processing to be specified using SQL and index component notation for MPP using GPUs and CPUs. Within each Field Forge node, the Kappa framework passes (subsets) of the data sets between processing kernels and into and out of data sets. Field Forge also utilizes the Kappa framework's Apache Portable Runtime (APR) database driver SQL connections to retrieve data fields from any database source (including other Field Forge sessions and nodes), process them using the MPP capabilities of the Kappa framework, and return them as PostgreSQL table or window fields returned from table or window functions respectively. This combination of features enables a Dataset Passing Interface (DPI) for distributed MPP. DPI leverages the existing skills, protocols, connectivity, and infrastructure of an organization. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1308_37625_psilambdakappa_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1308_37625_psilambdakappa_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Psi Lambda LLC</OrganizationName>
        <OrganizationURL>http://psilambda.com</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>11/07/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="kappa@psilambda.com">Psi Lambda LLC</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://fieldforge.com">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Finance</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Life Sciences</ApplicationType>
           <ApplicationType>Programming Tools</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>PostgreSQL OpenMP CUDA Window Table Partition,Psi Lambda LLC,kappa@psilambda.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>dbb00087-2b32-421a-9e65-f1295b0546b6</GUID>
        <Name>Introducing libflame with multi-GPU support</Name>
        <ShortDescription>We are happy to announce the fifth milestone release (r4648) of libflame, a modern replacement for the most-used functionality of the LAPACK linear algebra library. The main improvement since version 4.0 is that libflame now supports parallel execution using multiple GPUs through the SuperMatrix runtime system. By linking libflame with CUBLAS for the execution of BLAS routines on a single GPU, the SuperMatrix runtime system schedules operations to each GPU and manages the explicit movement of data. This release includes support for single and double precision real and complex floating point operations.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1307_205958_FLAMEbanner_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1307_205958_FLAMEbanner_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>UT Austin / Universitat Jaume</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>10/28/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="figual@icc.uji.es">Ernie Chan</Author>
           <Author email="">Francisco Igual</Author>
           <Author email="">Field van Zee, Robert van de Geijn</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://z.cs.utexas.edu/wiki/flame.wiki/FrontPage">Application</ContentType>
           <ContentType url="http://z.cs.utexas.edu/wiki/flame.wiki/FrontPage">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Libraries</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ernie Chan,Francisco Igual,Field van Zee, Robert van de Geijn,figual@icc.uji.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>6865f9fe-2a11-47bc-9d50-c0b4026248dc</GUID>
        <Name>alenka</Name>
        <ShortDescription>Alenka is a high level, high performance SQL-like language for data processing on CUDA hardware</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1306_53666_Cubes_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1306_53666_Cubes_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName></OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>11/02/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="antonmks@gmail.com">Anton K.</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/alenka/">Application</ContentType>
           <ContentType url="http://code.google.com/p/alenka/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Programming Tools</ApplicationType>
           <ApplicationType>databases</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Anton K.,antonmks@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>159ab96a-c59e-4853-bd70-03a4e7691b6f</GUID>
        <Name>CUDA Accelerated Face Recognition</Name>
        <ShortDescription>We explore one of the possibilities of parallelizing and optimizing a well-known Face Recognition algorithm, Principal Component Analysis (PCA) with Eigenfaces.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1305_NeST-NVIDIA_Center_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1305_NeST-NVIDIA_Center_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>NeST</OrganizationName>
        <OrganizationURL>http://nestsoftware.com</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>07/26/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="hpc@nestgroup.net">Numaan. A</Author>
           <Author email="">Sibi A</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://nestsoftware.com/nest/whitepapers/RealTime_Face_Recognition_Using_Cuda.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Numaan. A,Sibi A,hpc@nestgroup.net</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>df37fa39-5f4f-43cd-8a94-797995daea91</GUID>
        <Name>On the Use of Small 2D Convolutions on GPUs</Name>
        <ShortDescription>Computing many small 2D convolutions using FFTs is a basis for a large number of applications in many domains in science and engineering, among them electromagnetic diraction modeling in physics. The GPU architecture seems to be a suitable architecture to ac- celerate these convolutions, but reaching high application performance requires substantial development time and non-portable optimizations. In this work, we present the techniques, performance results and consid- erations to accelerate small 2D convolutions using CUDA, and compare performance to a multi-threaded CPU implementation. To improve programmability and performance of applications that make heavy use of small convolutions, we argue that two improvements to software and hardware are needed: FFT libraries must be extended with a single con- volution function and communication bandwidth between CPU and GPU needs to be drastically improved.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1304_2dconvolutions_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1304_2dconvolutions_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>TUDelft, ASML, TU/e</OrganizationName>
        <OrganizationURL>http://www.tudelft.nl/http://www.asml.nl/http://www.tue.nl</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>06/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="salumairy@gmail.com">Shams Al Umairy</Author>
           <Author email="a.s.vanamesfoort@tudelft.nl">Alexander S. van Amesfoort</Author>
           <Author email="sips@ewi.tudelft.nl,Irwan.Setija@asml.com,M.C.v.Beurden@tue.nl ">Henk Sips, Irwan Setija, Martijn van Beurden</Author>
           <Author email="Irwan.Setija@asml.com">Irwan Setija</Author>
           <Author email="M.C.v.Beurden@tue.nl">Martijn van Beurden</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.pds.ewi.tudelft.nl/~afoort/publ/a4mmc10/A4MMC-al-umairy.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>2D convolution, FFT, Electromagnetic diffraction grating,GPU, CUDA, Tesla,Shams Al Umairy,Alexander S. van Amesfoort,Henk Sips, Irwan Setija, Martijn van Beurden,salumairy@gmail.com,a.s.vanamesfoort@tudelft.nl,sips@ewi.tudelft.nl,Irwan.Setija@asml.com,M.C.v.Beurden@tue.nl </Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f0e9b09f-f104-4594-9ec8-165b10670b21</GUID>
        <Name>GFARGO</Name>
        <ShortDescription>GFARGO simulates the evolution of a gaseous protoplanetary disk subject to the gravitational perturbation of forming protoplanets embedded in it, by solving the Navier-Stokes equations on a polar mesh. It simultaneously describes how the planetary orbits expand or shrink with time, a process known as planetary migration, which plays an important role in shaping the planetary system that emerges once the disk dissipates. The actual implementation is two-dimensionnal, and performance gains ranging up to 90x are achieved with respect to CPU implementations.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1303_35573_fargo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1303_35573_fargo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Institute of Physical Sciences, UNAM, Mexico and CEA, Saclay, France</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>10/22/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>90</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="fmasset@cea.fr">Frederic Masset</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://fargo.in2p3.fr">Application</ContentType>
           <ContentType url="http://fargo.in2p3.fr">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Frederic Masset,fmasset@cea.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>6c2695c6-b17f-4357-ba6a-71715583d5ab</GUID>
        <Name>2 million pixel experiment</Name>
        <ShortDescription>This experimental application maps a HD video source (1080p) into 3D space. Each frame is processed in realtime on the GPU using NVIDIA CUDA technology. Each pixel in a frame (2.073.600 pixels per frame) is scaled by its luminance value and given the original color. The application is written in C# using DirectX11 via SlimDX, CUDA.NET and DirectShow.NET libraries.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1302_114048_visualcompute_cuda_app960_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1302_114048_visualcompute_cuda_app960_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>noumentalia.de - digital arts - visualcompute.com</OrganizationName>
        <OrganizationURL>http://www.noumentalia.de</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>10/22/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="info@visualcompute.com">Philipp Drieger</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.youtube.com/watch?v=kHhkLyJLLYI">Multimedia</ContentType>
           <ContentType url="http://www.visualcompute.com/">Presentation</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Digital Content Creation</ApplicationType>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Libraries</ApplicationType>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>HD video processing 1080p 3D CUDA .NET C# map 3D space,Philipp Drieger,info@visualcompute.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ece4bfff-3896-47dd-b0af-f6abbb592a8e</GUID>
        <Name>powDOG: powder diffraction on GPUs</Name>
        <ShortDescription>Diffraction, particularly of X-rays, is a powerful technique for the investigation of structure, microstructure and dynamical properties of matter. In order to link theoretical methods, like Molecular Dynamics and other atomistic approaches, and diffraction experiments we developed a new software for calculating the powder diffraction pattern of nano-sized objects on the GPUs. The software, soon to be made available under GPL license, allows the use of GPUs on different hosts for a direct (brute-force) computation of the Debye scattering equation.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1301_1322162_powDOG_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1301_1322162_powDOG_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Trento, Trento, Italy</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>02/08/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="luca.gelisio@unitn.it">Luca Gelisio</Author>
           <Author email="">Cristy Leonor Azanza Ricardo, Matteo Leoni, Paolo Scardi.</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.xrd.ing.unitn.it/cms/index.php">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Powder diffraction, Debye scattering equation, nanostructured materials,Luca Gelisio,Cristy Leonor Azanza Ricardo, Matteo Leoni, Paolo Scardi.,luca.gelisio@unitn.it</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>85367756-abdb-4bc0-ab43-beed43680f51</GUID>
        <Name>GPU Accelerated Likelihoods for Stereo-Based Articulated Tracking</Name>
        <ShortDescription>For many years articulated tracking has been an active research topic in the computer vision community. While working solutions have been suggested, computational time is still problematic. We present a GPU implementation of a ray-casting based likelihood model that is orders of magnitude faster than a traditional CPU implementation. We explain the non-intuitive steps required to attain an optimized GPU implementation, where the dominant part is to hide the memory latency effectively. Benchmarks show that computations which previously required several minutes, are now performed in few seconds</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1299_88964_gpu_vision_2010_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1299_88964_gpu_vision_2010_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>The eScience Centre,Dept. of Computer Science, University of Copenhagen</OrganizationName>
        <OrganizationURL>http://www.diku.dk</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>09/05/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>600</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="runef@diku.dk">Rune Mollegaard Friborg</Author>
           <Author email="hauberg@diku.dk">Soren Hauberg</Author>
           <Author email="kenny@diku.dk">Kenny Erleben </Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://iphys.wordpress.com/2010/09/04/gpu-accelerated-likelihoods-for-stereo-based-articulated-tracking/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Ray Tracing</ApplicationType>
           <ApplicationType>Computer Vision</ApplicationType>
           <ApplicationType>Machine Learning</ApplicationType>
           <ApplicationType>Tracking</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Articulated Tracking,Particle Filtering,Rune Mollegaard Friborg,Soren Hauberg,Kenny Erleben ,runef@diku.dk,hauberg@diku.dk,kenny@diku.dk</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>8052c851-ce8c-4767-9452-e3df12796c1d</GUID>
        <Name>Electronic Design Automation</Name>
        <ShortDescription>GPU-Based Robust Multigrid Preconditioned Solver for Large Scale On-Chip Power Grid Simulation</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1298_1108533_40-1-9_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1298_1108533_40-1-9_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Michigan Technological University</OrganizationName>
        <OrganizationURL>http://www.ece.mtu.edu/~zhuofeng/MTU_VLSI_DA.htm</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>09/15/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>50</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="zhuofeng@mtu.edu">Zhuo Feng</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ece.mtu.edu/~zhuofeng/MTU_VLSI_DA_files/papers/mgpcg_dac10_slides.pdf">Multimedia</ContentType>
           <ContentType url="http://www.ece.mtu.edu/~zhuofeng/MTU_VLSI_DA_files/papers/mgpcg_dac10.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
           <ApplicationType>Electronic Design Automation</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Multigrid, preconditioned iterative methods, power delivery network, on-chip interconnect simulation, VLSI system,Zhuo Feng,zhuofeng@mtu.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c9e901d1-7e78-40ed-ac3d-4b59592bbb9b</GUID>
        <Name>Engine_cudamrg for OpenSSL</Name>
        <ShortDescription>Engine_cudamrg is a cryptographic engine for the OpenSSL Toolkit that can accelerate some operation using a CUDA supported device, we currently support the following cipher types: * AES-128-ECB * AES-128-CBC * AES-192-ECB * AES-192-CBC * AES-256-ECB * AES-256-CBC We support both encryption and decryption for theese cipher types. For future releases we plan to optimize currently supported cipher types, add more cipher types and digest algorithms.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1297_8191_engineCudamrg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1297_8191_engineCudamrg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Engine_cudamrg Development Team</OrganizationName>
        <OrganizationURL>http://groups.google.com/group/engine-cudamrg</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>07/26/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="paolo.margara@gmail.com">Paolo Margara</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/engine-cuda/">Application</ContentType>
           <ContentType url="http://code.google.com/p/engine-cuda/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Cryptography</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>AES, cryptography,Paolo Margara,paolo.margara@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8179b89f-1d5e-4dad-84d4-6b0466dcde7e</GUID>
        <Name>Smoke Simulation for Fire Engineering using a Multigrid Method on Graphics Hardware</Name>
        <ShortDescription>We present a GPU-based Computational Fluid Dynamics solver for the purpose of fire engineering. We apply a multigrid method to the Jacobi solver when solving the Poisson pressure equation, supporting internal boundaries. Boundaries are handled on the coarse levels, ensuring that boundaries will never vanish after restriction. We demonstrate cases where the multigrid solver computes results up to three times more accurate than the standard Jacobi method within the same time. Providing rich visual details and flows closer to widely accepted standards in fire engineering. Making accurate interactive physical simulation for engineering purposes, has the benefit of reducing production turn-around time. We have measured speed-up improvements by a factor of up to 350, compared to existing CPU-based solvers. The present CUDA-based solver promises huge potential in economical benefits, as well as constructions of safer and more complex buildings. In this paper, the multigrid method is applied to fire engineering. However, this is not a limitation, since improvements are possible for other fields as well. Traditional Jacobi solvers are particulary suitable for the methods presented.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1296_121739_vriphys2009_glimberg_erleben_teaser_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1296_121739_vriphys2009_glimberg_erleben_teaser_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Department of Computer Science/University of Copenhagen</OrganizationName>
        <OrganizationURL>http://www.diku.dk</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>11/05/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>350</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="glimberg@diku.dk">Stefan Glimberg</Author>
           <Author email="kenny@diku.dk">Kenny Erleben</Author>
           <Author email="">Jens Bennetsen</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://iphys.wordpress.com/2009/11/04/smoke-simulation-for-fire-engineering-using-a-multigrid-method-on-graphics-hardware/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
           <ApplicationType>Pre-parameter studies of virtual designs</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Stefan Glimberg,Kenny Erleben,Jens Bennetsen,glimberg@diku.dk,kenny@diku.dk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>2e28ba24-3cc8-4333-be77-fb572dc28776</GUID>
        <Name>GPU Accelerated Tandem Traversal of Blocked Bounding Volume Hierarchy Collision Detection for Multibody Dynamics</Name>
        <ShortDescription>The performance bottleneck of physics based animation, is often the collision detection. It is well-known by practitioners that the collision detection may consume more than half of the simulation time. In this work we will introduce a novel approach for collision detection using bounding volume hierarchies. Our approach makes it possible to perform non-convex object versus non-convex object collision on the GPU, using tandem traversals of bounding volume hierarchies. Prior work only supports single traversals on GPUs. We introduce a blocked hierarchy data structure, using imaginary nodes and a simultaneous descend in the tandem traversal. The data structure design and traversal are highly specialized for exploiting the parallel threads in the NVIDIA GPUs. As proof-of-concept we demonstrate a GPU implementation for a multibody dynamics simulation, showing an approximate speedup factor of up to 8 compared to a CPU implementation</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1295_52591_vriphys2009_damkjaer_erleben_teaser_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1295_52591_vriphys2009_damkjaer_erleben_teaser_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Department of Computer Science, University of Copenhagen.</OrganizationName>
        <OrganizationURL>http://www.diku.dk/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>11/05/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>8</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="damkjaer@diku.dk">Jesper Damkjaer</Author>
           <Author email="kenny@diku.dk"> Kenny Erleben</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://iphys.wordpress.com/2009/11/05/127/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Game Physics</ApplicationType>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Libraries</ApplicationType>
           <ApplicationType>Programming Tools</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bounding volume Hierarchies, Collision Detection, Rigid Body Simulation,Jesper Damkjaer, Kenny Erleben,damkjaer@diku.dk,kenny@diku.dk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>70463033-d9d8-49f6-9067-8a982284a733</GUID>
        <Name>SpofetwraremGPU: Using graphics processing units in RNA microarray association studies</Name>
        <ShortDescription>Background: Many analyses of microarray association studies involve permutation, bootstrap resampling and crossvalidation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed.</ShortDescription>
        <URL>http://www.gpucomputing.net/?q=node/2083</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1294_bmc_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1294_bmc_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>BMC Bioinformatics</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>05/22/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>78</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Ivo D Shterev</Author>
           <Author email="">Sin-Ho Jung</Author>
           <Author email="">Stephen L George</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.gpucomputing.net/?q=node/2083">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ivo D Shterev,Sin-Ho Jung,Stephen L George</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>796fa229-7424-4736-b68b-793cf9120ee9</GUID>
        <Name>High performance GPU radix sorting in CUDA</Name>
        <ShortDescription>This project implements a very fast, efficient radix sorting method for CUDA-capable devices. For sorting large sequences of fixed-length keys (and values), we believe our sorting primitive to be the fastest available for any fully-programmable microarchitecture: our stock NVIDIA GTX480 sorting results exceed the 1G keys/sec average sorting rate (i.e., one billion 32-bit keys sorted per second). </ShortDescription>
        <URL>http://code.google.com/p/back40computing/wiki/RadixSorting</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1291_SortingSmall_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1291_SortingSmall_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>CUDA Developer</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>27</ReleaseDay>
        <ReleaseDateDisplay>05/27/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Duane Merrill</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/back40computing/wiki/RadixSorting">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Duane Merrill</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>b8244c5b-21cf-4a7d-8eef-7b8e72792b98</GUID>
        <Name>Hardware-Assisted Projected Tetrahedra</Name>
        <ShortDescription>We present a flexible and highly efficient hardware-assisted volume renderer grounded on the original Projected Tetrahedra (PT) algorithm. Unlike recent similar approaches, our method is exclusively based on the rasterization of simple geometric primitives and takes full advantage of graphics hardware. Both vertex and geometry shaders are used to compute the tetrahedral projection, while the volume ray integral is evaluated in a fragment shader; hence, volume rendering is performed entirely on the GPU within a single pass through the pipeline.</ShortDescription>
        <URL>http://www.lcg.ufrj.br/Members/andream/papers/cgf2010.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1290_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1290_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Rio de Janeiro</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>18</ReleaseDay>
        <ReleaseDateDisplay>03/18/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">A. Maximo</Author>
           <Author email="">R. Marroquim</Author>
           <Author email="">R. Farias</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.lcg.ufrj.br/Members/andream/papers/cgf2010.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>A. Maximo,R. Marroquim,R. Farias</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1af9d850-4d52-4267-9787-72027ff4928c</GUID>
        <Name>A Parallel Algorithm for Construction of Uniform Grids</Name>
        <ShortDescription>We present a fast, parallel GPU algorithm for construction of uniform grids for ray tracing, which we implement in CUDA. The algorithm performance does not depend on the primitive distribution, because we reduce the problem to sorting pairs of primitives and cell indices. Our implementation is able to take full advantage of the parallel architecture of the GPU, and construction speed is faster than CPU algorithms running on multiple cores.</ShortDescription>
        <URL>http://graphics.cs.uni-sb.de/fileadmin/cguds/papers/2009/kalojanov_hpg2009/kalojanov_hpg2009.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1289_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1289_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Saarland University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>13</ReleaseDay>
        <ReleaseDateDisplay>06/13/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Javor Kalojanov</Author>
           <Author email="">Philipp Slusallek</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://graphics.cs.uni-sb.de/fileadmin/cguds/papers/2009/kalojanov_hpg2009/kalojanov_hpg2009.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Javor Kalojanov,Philipp Slusallek</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>17270ae2-0417-4766-88db-17f20e1e3073</GUID>
        <Name>Evaluation of Streaming Aggregation on Parallel Hardware Architectures</Name>
        <ShortDescription>We present a case study parallelizing streaming aggregation on three different parallel hardware architectures. Aggregation is a performance-critical operation for data summarization in stream computing, and is commonly found in sense-and-respond applications.  Currently available commodity parallel hardware provides promise as accelerators for streaming aggregation. However, how streaming aggregation can map to the different parallel architectures is still an open question.</ShortDescription>
        <URL>http://people.cs.vt.edu/~scschnei/papers/debs2010.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1288_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1288_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>IBM Research Division</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>07/12/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Scott Schneider</Author>
           <Author email="">Henrique Andrade</Author>
           <Author email="">Bugra Gedik</Author>
           <Author email="">Kun-Lung Weu</Author>
           <Author email="">Dimitrios S. Nikolopoulos</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://people.cs.vt.edu/~scschnei/papers/debs2010.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Scott Schneider,Henrique Andrade,Bugra Gedik</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9f95c82d-a9c4-4324-be40-85e0a4d5ebd3</GUID>
        <Name>A Middleware for Efficient Stream Processing in CUDA</Name>
        <ShortDescription>This paper presents a middleware capable of out-of-order execution of kernels and data transfers for
efficient stream processing in the compute unified device architecture (CUDA). Our middleware runs on the
CUDA-compatible graphics processing unit (GPU). Using the middleware, application developers are allowed
to easily overlap kernel computation with data transfer between the main memory and the video memory.</ShortDescription>
        <URL>http://www-hagi.ist.osaka-u.ac.jp/research/papers/201005_s-nakagw_isc.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1287_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1287_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Trier</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>03/12/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Shinta Nakagawa</Author>
           <Author email="">Fumihiko Ino</Author>
           <Author email="">Kenichi Hagihara</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www-hagi.ist.osaka-u.ac.jp/research/papers/201005_s-nakagw_isc.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Shinta Nakagawa,Fumihiko Ino,Kenichi Hagihara</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5fc49232-b32f-49ab-81d9-8548d9b4b730</GUID>
        <Name>An Adaptive Performance Modeling Tool for GPU Architectures</Name>
        <ShortDescription>This paper presents an analytical model to predict the performance of general-purpose applications on a GPU architecture. The model is designed to provide performance information to an auto-tuning compiler and assist it in narrowing down the search to the more promising implementations. It can also be incorporated into a tool to help programmers better assess the performance bottlenecks in their code. We analyze each GPU kernel and identify how the kernel exercises major GPU microarchitecture features.</ShortDescription>
        <URL>http://impact.crhc.illinois.edu/ftp/conference/sara.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1286_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1286_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Illinois at Urbana-Champaign</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>11/19/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="bsadeghi@illinois.edu">Sara S. Baghsorkhi</Author>
           <Author email="matthieu@illinois.edu">Matthieu Delahaye</Author>
           <Author email="sjp@illinois.edu">Sanjay J. Patel</Author>
           <Author email="wgropp@illinois.edu">William D. Gropp</Author>
           <Author email="hwu@illinois.edu">Wen-mei W. Hwu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://impact.crhc.illinois.edu/ftp/conference/sara.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sara S. Baghsorkhi,Matthieu Delahaye,Sanjay J. Patel,bsadeghi@illinois.edu,matthieu@illinois.edu,sjp@illinois.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d3d1ef02-d0ff-40e9-ae3c-2f380b1f45d7</GUID>
        <Name>Kd-Jump: a Path-Preserving Stackless Traversal for Faster Isosurface Raytracing on GPUs</Name>
        <ShortDescription>Stackless traversal techniques are often used to circumvent memory bottlenecks by avoiding a stack and replacing return traversal with extra computation. This paper addresses whether the stackless traversal approaches are useful on newer hardware and technology (such as CUDA). To this end, we present a novel stackless approach for implicit kd-trees, which exploits the benefits of index-based node traversal, without incurring extra node visitation. This approach, which we term Kd-Jump, enables the traversal to immediately return to the next valid node, like a stack, without incurring extra node visitation (kd-restart).</ShortDescription>
        <URL>http://vplab.snu.ac.kr/lectures/09-2/graphics/lecture_notes/11%20Kd-jump.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1285_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1285_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Bangor University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>27</ReleaseDay>
        <ReleaseDateDisplay>07/27/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="meirion@bangor.ac.uk">David m. Hughes</Author>
           <Author email="i.s.lim@bangor.ac.uk">Ik Soo Lim</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://vplab.snu.ac.kr/lectures/09-2/graphics/lecture_notes/11%20Kd-jump.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>David m. Hughes,Ik Soo Lim,meirion@bangor.ac.uk,i.s.lim@bangor.ac.uk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b82e77d4-8aab-4e0c-828d-d9c87e198557</GUID>
        <Name>Accelerating Flow Cytometry Data Clustering Workflows with Graphics Processing Units</Name>
        <ShortDescription>Flow cytometry is a mainstay technology used by biologists and immunologists for counting, sorting,
and analyzing cells suspended in a fluid. The results of flow cytometry are used in a variety
of important clinical and research applications such as phenotyping, DNA analysis, and cell function
analysis. Like many modern scientific applications, flow cytometry produces massive amounts
of data which must be clustered in order to be useful. Conventional analysis of flow cytometry
data uses manual sequential bivariate gating.</ShortDescription>
        <URL>http://cyberaide.googlecode.com/svn/trunk/papers/thesis-pangborn/proposal/pangborn-proposal.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1284_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1284_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Rochester Institute of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>09/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Andrew D. Pangborn</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://cyberaide.googlecode.com/svn/trunk/papers/thesis-pangborn/proposal/pangborn-proposal.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Andrew D. Pangborn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b3217a94-0e15-431b-bdba-c2b96f030be4</GUID>
        <Name>GPU Accelerated Scientific Computing: Fluid and Particulate Flows with CUDA</Name>
        <ShortDescription>Simulations of particulate flows, which involve gases and liquids with suspended solid particles like dust, are generally highly CPU-time demanding. The question arises whether such computations can be performed on the GPU applying highly parallel programming models like CUDA. In this paper we demonstrate that numerical simulation in that context can greatly benefit from these emerging technologies and present results in a 2D and 3D setup.</ShortDescription>
        <URL>http://numhpc.math.kit.edu/download/PARS_Full_Paper_Final_Heuveline_Hahn_Rocker.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1283_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1283_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Karlsruhe</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>10/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="tobias.hahn@kit.edu">Tobias Hahn</Author>
           <Author email="vincent.heuveline@kit.edu">Vincent Heuveline</Author>
           <Author email="bjoern.rocker@kit.edu">Bjorn Rocker</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://numhpc.math.kit.edu/download/PARS_Full_Paper_Final_Heuveline_Hahn_Rocker.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Tobias Hahn,Vincent Heuveline,Bjorn Rocker,tobias.hahn@kit.edu,vincent.heuveline@kit.edu,bjoern.rocker@kit.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3587097f-b801-4185-ad0c-f4d4c78480c9</GUID>
        <Name>General-Purpose vs. GPU: Comparison of Many-Cores on Irregular Workloads</Name>
        <ShortDescription>XMT1 is a general-purpose many-core parallel architecture.  The foremost design objective for XMT was to meet
the highest standards for ease of parallel programming.  GPUs, on the other hand, have acquired a strong reputation on performance, sometimes at the expense of ease of programming.  The current paper presents a performance comparison on diverse workloads between XMT and an NVIDIA CUDA-enabled GPU. Configured with
roughly the same amount of chip resources as the GPU, XMT achieves an average speedup of 6.05x on irregular
applications, while incurring an average slowdown of 2.07x on regular ones.</ShortDescription>
        <URL>http://www.umiacs.umd.edu/users/vishkin/XMT/CKTV_hotpar10.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1282_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1282_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Maryland, College Park</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>27</ReleaseDay>
        <ReleaseDateDisplay>04/27/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="gcaragea@umd.edu">George C. Caragea</Author>
           <Author email="keceli@umd.edu">Fuat Keceli</Author>
           <Author email="tzannes@umd.edu">Alexandros Tzannes</Author>
           <Author email="vishkin@umd.edu">Uzi Vishkin</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.umiacs.umd.edu/users/vishkin/XMT/CKTV_hotpar10.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>George C. Caragea,Fuat Keceli,Alexandros Tzannes,gcaragea@umd.edu,keceli@umd.edu,tzannes@umd.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ace3d995-2e1a-4d64-a5af-450ffcfee3fd</GUID>
        <Name>Fast Minimum Spanning Tree for Large Graphs on the GPU</Name>
        <ShortDescription>Graphics Processor Units are used for many general purpose processing due to high compute power available on them. Regular, data-parallel algorithms map well to the SIMD architecture of current GPU. Irregular algorithms on discrete structures like graphs are harder to map to them. Efficient data-mapping primitives can play crucial role in mapping such algorithms onto the GPU. In this paper, we present a minimum spanning tree algorithm on Nvidia GPUs under CUDA, as a recursive formulation of Boruvka's approach for undirected graphs.</ShortDescription>
        <URL>http://www.gpucomputing.net/?q=node/1612</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1281_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1281_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>International Institute of Information Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>06/07/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>50</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="vibhavvinet@research.iiit.ac.in">Vibhav Vineet</Author>
           <Author email="harishpk@research.iiit.ac.in">Pawan Harish</Author>
           <Author email="skp@research.iiit.ac.in">Suryakant Patidar</Author>
           <Author email="pjn@iiit.ac.in">P. J. Narayanan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.gpucomputing.net/?q=node/1612">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Vibhav Vineet,Pawan Harish,Suryakant Patidar,vibhavvinet@research.iiit.ac.in,harishpk@research.iiit.ac.in,skp@research.iit.ac.in</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>81ab4f38-8401-4ac6-9e25-d5ad5edceb53</GUID>
        <Name>CUDA-based Triangulations of Convolution Molecular Surfaces</Name>
        <ShortDescription>Computing molecular surfaces is important to measure areas and volumes of molecules, as well as to infer useful information about interactions with other molecules. Over the years many algorithms have been developed to triangulate and to render molecular surfaces. However, triangulation algorithms usually are very expensive in terms of memory storage and time performance, and thus far from real-time performance. Fortunately, the massive computational power of the new generation of low-cost GPUs opens up an opportunity
window to solve these problems: real-time performance and cheap computing commodities.</ShortDescription>
        <URL>http://salsahpc.indiana.edu/ECMLS2010/papers/066.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1280_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1280_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universidade da Beira Interior</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>06/20/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="sdias@ubi.pt">Sergio Dias</Author>
           <Author email="kuldeep@iitg.ernet.in">Kuldeep Bora</Author>
           <Author email="agomes@di.ubi.pt">Abel Gomes</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://salsahpc.indiana.edu/ECMLS2010/papers/066.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sergio Dias,Kuldeep Bora,Abel Gomes,sdias@ubi.pt,kuldeep@iitg.ernet.in,agomes@di.ubi.pt</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a5df523d-c91c-4105-b8ee-fb0979362cc3</GUID>
        <Name>Data Parallel Three-Dimensional Cahn-Hilliard Field Equation Simulation on GPUs with CUDA</Name>
        <ShortDescription>Computational scientific simulations have long used parallel computers to increase their performance.  Recently graphics cards have been utilised to provide this functionality. GPGPU APIs such as NVIDIA's CUDA can be used to harness the power of GPUs for purposes other than computer graphics. GPUs are designed for processing two-dimensional data.  In previous work we have presented several two-dimensional Cahn-Hilliard simulations that each utilise different CUDA memory types and compared their results.</ShortDescription>
        <URL>http://www.massey.ac.nz/~kahawick/cstn/073/cstn-073.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1279_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1279_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Massey University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>02/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="d.p.playne@massey.ac.nz">D. P. Playne</Author>
           <Author email="k.a.hawick@massey.ac.nz">K. A. Hawick</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.massey.ac.nz/~kahawick/cstn/073/cstn-073.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>D. P. Playne,K. A. Hawick,d.p.playne@massey.ac.nz,k.a.hawick@massey.ac.nz</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0b5f7cf7-9d2e-4318-b4c5-9c9a1dba1143</GUID>
        <Name>FAST VISUAL HULL AND STEREO MATCHING ON CUDA</Name>
        <ShortDescription>Stereo matching and visual hull are techniques that are often used in 3D reconstruction. This paper presents and evaluates implementations of these algorithms on the GPU using the CUDA architecture. Experimental results show that both, visual hull and stereo matching, have much to gain in terms of speed from the data parallel execution model.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1278_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1278_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Surrey</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>02/11/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="mykyta.fastovets@surrey.ac.uk">Mykyta Fastovets</Author>
           <Author email=" j.guillemaut@surrey.ac.uk">Jean-Yves Guillemaut</Author>
           <Author email="a.hiltong@surrey.ac.uk">Adrian Hilton</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://eprints.pascal-network.org/archive/00005646/01/fast_vh_and_sm_cuda.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mykyta Fastovets,Jean-Yves Guillemaut,Adrian Hilton,mykyta.fastovets@surrey.ac.uk, j.guillemaut@surrey.ac.uk,a.hiltong@surrey.ac.uk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>17140b63-a39b-4cdf-bd86-9a54079055b8</GUID>
        <Name>Speed records for NTRU</Name>
        <ShortDescription>In this paper NTRUEncrypt is implemented for the first time on a GPU using the CUDA platform. As is shown, this operation lends itself excellently for parallelization and performs extremely well compared to similar security levels for ECC and RSA giving speedups of around three to four orders of magnitude. The focus is on achieving a high throughput, in this case performing a large number of encryptions/decryptions in parallel.</ShortDescription>
        <URL>http://www.gpucomputing.net/?q=node/1573</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1277_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1277_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Leuven</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>09/10/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Jens Hermans</Author>
           <Author email="">Frederik Vercauteren</Author>
           <Author email="">Bart Preneel</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.gpucomputing.net/?q=node/1573">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jens Hermans,Frederik Vercauteren,Bart Preneel</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>042c7881-8db7-45d5-8824-8c7f99c6ce91</GUID>
        <Name>Implementing a GPU Programming Model on a non-GPU Accelerator Architecture</Name>
        <ShortDescription>Parallel codes are written primarily for the purpose of performance.  It is highly desirable that parallel codes be portable between parallel architectures without significant performance degradation or code rewrites. While performance portability and its limits have been studied thoroughly on single processor systems, this goal has been less extensively studied and is more difficult to achieve for parallel systems. Emerging single-chip parallel platforms are no exception; writing code that obtains good performance across GPUs and other many-core CMPs can be challenging.</ShortDescription>
        <URL>http://hal.archives-ouvertes.fr/docs/00/49/39/05/PDF/A4MMC-kofsky.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1275_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1275_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Illinois at Urbana-Champaign</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>06/21/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Stephen M. Kofsky</Author>
           <Author email="">Daniel R. Johnson</Author>
           <Author email="">John A. Stratton</Author>
           <Author email="">Wen-mei W. Hwu</Author>
           <Author email="">Sanjay J. Patel</Author>
           <Author email="">Steven S. Lumetta</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://hal.archives-ouvertes.fr/docs/00/49/39/05/PDF/A4MMC-kofsky.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Stephen M. Kofsky,Daniel R. Johnson,John A. Stratton</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>93887a5d-c38e-40ec-bda5-b5bb59aae6d7</GUID>
        <Name>Parallelising Wavefront Applications on General-Purpose GPU Devices</Name>
        <ShortDescription>Pipelined wavefront applications form a large portion of the high performance scientific computing workloads at supercomputing centres such as LANL in the United States and AWE in the United Kingdom. This paper investigates the viability of utilising graphics processing units (GPUs) for the acceleration of these codes, using NVIDIA's Compute Unified Device Architecture (CUDA). Wavefront applications differ from the massively data-parallel codes typically selected for execution on GPUs in that their computation must obey a strict data dependency, limiting the achievable level of parallelism.</ShortDescription>
        <URL>http://www2.warwick.ac.uk/fac/sci/dcs/research/pcav/publications/pubs/ukpew-gpu-wavefronts.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1274_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1274_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Warwick</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>06/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="sjp@dcs.warwick.ac.uk">S. J. Pennycook</Author>
           <Author email="g.r.mudalige@dcs.warwick.ac.uk">G. R. Mudalige</Author>
           <Author email="sdh@dcs.warwick.ac.uk">S. D. Hammond</Author>
           <Author email="saj@dcs.warwick.ac.uk">S. A. Jarvis</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www2.warwick.ac.uk/fac/sci/dcs/research/pcav/publications/pubs/ukpew-gpu-wavefronts.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>S. J. Pennycook,G. R. Mudalige,S. D. Hammond,sjp@dcs.warwick.ac.uk,g.r.mudalige@dcs.warwick.ac.uk,sdh@dcs.warwick.ac.uk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f6b4d7a3-c2f6-456c-80f9-1a6666c19c99</GUID>
        <Name>Performance Cost Analysis of Software-Implemented Hardware Fault Tolerance Methods in General-Purpose GPU Computing</Name>
        <ShortDescription>Commercial off-the-shelf graphics processing units (GPUs) provide an attractive, inexpensive platform for highthroughput scientific applications. Whereas fault tolerance may be desirable for many scientific applications, off-the-shelf GPU hardware has been designed for commodity graphics applications, where fault tolerance is not necessary.</ShortDescription>
        <URL>http://homepages.cae.wisc.edu/~ece753/papers/Paper_4.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1273_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1273_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Wisconsin, Madison</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>04/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="agregerson@wisc.edu">Anthony E. Gregerson</Author>
           <Author email="aabhyankar@wisc.edu">Ameya V. Abhyankar</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://homepages.cae.wisc.edu/~ece753/papers/Paper_4.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Anthony E. Gregerson,Ameya V. Abhyankar,agregerson@wisc.edu,aabhyankar@wisc.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a90fc8ef-5fba-447e-b091-fd5f9d5b1e50</GUID>
        <Name>GPU Accelerated Stylistic Augmented Reality</Name>
        <ShortDescription>With the introduction of programmable graphics pipeline, the highly parallel processing power of graphical processing units (GPU) is being used not only for special graphics effects but also for general purpose computation in areas such as molecular dynamics simulation, stock options pricing, and image processing. In this work, we utilize this power to increase the immersion level in an augmented reality (AR) application.</ShortDescription>
        <URL>http://www.vmasc.odu.edu/downloads/Capstone_Papers/Engineering/Aras.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1272_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1272_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Old Dominion University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>04/02/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Rifat Aras</Author>
           <Author email="">Yuzhong Shen</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.vmasc.odu.edu/downloads/Capstone_Papers/Engineering/Aras.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Rifat Aras,Yuzhong Shen</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>33549c56-faa7-4ed2-8762-b7176670e639</GUID>
        <Name>A Batched GPU Algorithm for Set Intersection</Name>
        <ShortDescription>Intersection of inverted lists is a frequently used operation in search engine systems. Efficient CPU and GPU
intersection algorithms for large problem size are well studied.  We propose an efficient GPU algorithm for high performance intersection of inverted index lists on CUDA platform.  This algorithm feeds queries to GPU in batches, thus can take full advantage of GPU processor cores even if problem size is small.  We also propose an input preprocessing method which alleviate load imbalance effectively.</ShortDescription>
        <URL>http://nbjl.nankai.edu.cn/Lab_Papers/2009/A%20Batched%20GPU%20Algorithm%20for%20Set%20Intersection.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1271_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1271_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nankai University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>09/19/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="wakensky@gmail.com">Di Wu</Author>
           <Author email="zhangfan555@gmail.com">Fan Zhang</Author>
           <Author email="aonaiyong@163.com">Naiyong Ao</Author>
           <Author email="wangfang09@gmail.com">Fang Wang</Author>
           <Author email="liuxg74@yahoo.com.cn">Xiaoguang Liu</Author>
           <Author email="wgzwp@163.com">Gang Wang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://nbjl.nankai.edu.cn/Lab_Papers/2009/A%20Batched%20GPU%20Algorithm%20for%20Set%20Intersection.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Di Wu,Fan Zhang,Naiyong Ao,wakensky@gmail.com,zhangfan555@gmail.com,aonaiyong@163.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c7c4497b-7a3a-4847-8010-748fc72fdd19</GUID>
        <Name>GPU-based ultrafast IMRT plan optimization</Name>
        <ShortDescription>The widespread adoption of on-board volumetric imaging in cancer radiotherapy has stimulated research efforts to develop online adaptive radiotherapy techniques to handle the inter-fraction variation of the patient's geometry. Such efforts face major technical challenges to perform treatment planning in real time. To overcome this challenge, we are developing a supercomputing online re-planning environment (SCORE) at the University of California, San Diego (UCSD). As part of the SCORE project, this paper presents our work on the implementation of an intensity-modulated radiation therapy (IMRT) optimization algorithm on graphics processing units (GPUs).</ShortDescription>
        <URL>http://iopscience.iop.org/0031-9155/54/21/008</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1270_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1270_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of California, San Diego</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>10/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>40</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Chunhua Men</Author>
           <Author email="">Xuejun Gu</Author>
           <Author email="">Dongju Choi</Author>
           <Author email="">Amitava Majumdar</Author>
           <Author email="">Ziyi Zheng</Author>
           <Author email="">Klaus Mueller</Author>
           <Author email="">Steve B. Jiang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://iopscience.iop.org/0031-9155/54/21/008">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Chunhua Men,Xuejun Gu,Dongju Choi</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>4e362946-a2fd-4f2b-8740-0009b7348bd6</GUID>
        <Name>Real-time Forest Simulation for a Flight Simulator using a GPU</Name>
        <ShortDescription>This paper concerns the real-time simulation of forests for a flight simulator, exploiting the capacities of recent graphics cards. As we will show, these architectures coupled with recent ergonomic environments like CUDA allow C-programmers to implement highly parallelizable algorithms to be executed on GPU, without being specialized in parallel programming.</ShortDescription>
        <URL>http://www.ecam-rennes.fr/IMG/pdf/ICCTA2008.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1268_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1268_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Louis de Broglie, Graduate Engineering School</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>02/19/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="laferte@ecole-debroglie.fr">Jean-Marc Laferte</Author>
           <Author email="g.daussin@ecole-debroglie.fr">Guillaume Daussin</Author>
           <Author email="flifla@ecole-debroglie.fr">Pascal Haigron</Author>
           <Author email="Pascal.Haigron@univ-rennes1.fr">Jihed Flifla</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ecam-rennes.fr/IMG/pdf/ICCTA2008.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jean-Marc Laferte,Guillaume Daussin,Jihed Flifla,laferte@ecole-debroglie.fr,g.daussin@ecole-debroglie.fr,flifla@ecole -debroglie.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e490331e-ba50-4ff8-ad01-f2af8b63cada</GUID>
        <Name>cuInspiral: prototype gravitational waves detection pipeline fully coded on GPU using CUDA</Name>
        <ShortDescription>In this paper we report the prototype of the first coalescing binary detection pipeline fully implemented on NVIDIA GPU hardware accelerators. The code has been embedded in a GPU library, called cuInspiral and has been developed under CUDA framework. The library contains for example a PN gravitational wave signal generator, matched filtering/FFT and detection algorithms that have been profiled and compared with the corresponding CPU code with dedicated benchmark in order to provide gain factor respect to the standard CPU
implementation.</ShortDescription>
        <URL>http://arxiv.org/PS_cache/arxiv/pdf/1006/1006.4644v1.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1267_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1267_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>National Institute of Nuclear Physics</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>06/16/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Leone B. Bosi</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://arxiv.org/PS_cache/arxiv/pdf/1006/1006.4644v1.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Leone B. Bosi</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b0a54882-2b4a-4058-a6e7-b829a2a04a53</GUID>
        <Name>GPU Accelerated Path-planning for Multi-agents in Virtual Environments</Name>
        <ShortDescription>Many games are populated by synthetic humanoid actors that act as autonomous agents. The animation of humanoids in real-time applications is yet a challenge if the problem involves attaining a precise location in a virtual world (path-planning), and moving realistically according to its own personality, intentions and mood (motion planning). In this paper we present a strategy to implement - using CUDA on GPU - a path planner that produces natural steering behaviors for virtual humans using a numerical solution for boundary value problems.</ShortDescription>
        <URL>http://www.sbgames.org/papers/sbgames09/computing/full/cp15_09.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1266_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1266_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Federal University of Rio Grande do Sul</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>10/08/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>56</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Leonardo G. Fischer</Author>
           <Author email="">Renato Silveira</Author>
           <Author email="">Luciana Nedel</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.sbgames.org/papers/sbgames09/computing/full/cp15_09.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Leonardo G. Fischer,Renato Silveira,Luciana Nedel</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>cb2fd4bb-d313-4c57-aae7-547e4b78dc27</GUID>
        <Name>Real-time image segmentation on a GPU</Name>
        <ShortDescription>Efficient segmentation of color images is important for many applications in computer vision. Non-parametric solutions are required in situations where little or no prior knowledge about the data is available.  In this paper, we present a novel parallel image segmentation algorithm which segments images in real-time in a non-parametric way. The algorithm finds the equilibrium states of a Potts model in the superparamagnetic
phase of the system.</ShortDescription>
        <URL>http://upcommons.upc.edu/e-prints/bitstream/2117/7866/1/1104-Real-time-image-segmentation-on-a-GPU.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1264_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1264_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Georg-August University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>06/28/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="abramov@bccn-goettingen.de">Alexey Abramov</Author>
           <Author email="tomas@bccn-goettingen.de">Tomas Kulvicius</Author>
           <Author email="worgottg@bccn-goettingen.de">Florentin Worgotter</Author>
           <Author email="bdellen@iri.upc.edu">Babette Dellen</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://upcommons.upc.edu/e-prints/bitstream/2117/7866/1/1104-Real-time-image-segmentation-on-a-GPU.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alexey Abramov,Tomas Kulvicius,Florentin Worgotter,abramov@bccn-goettingen.de,tomas@bccn-goettingen.de,worgottg@bccn-goettingen.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7f4e5772-7ed9-40a1-8da0-7694fec71c3c</GUID>
        <Name>Development of a GPU-Based Monte Carlo Dose Calculation Code for Coupled Electron-Photon Transport</Name>
        <ShortDescription>Monte Carlo simulation is the most accurate method for absorbed dose calculations in radiotherapy. Its efficiency still requires improvement for routine clinical applications, especially for online adaptive radiotherapy. In this paper, 20 we report our recent development on a GPU-based Monte Carlo dose calculation code for coupled electron-photon transport. We have implemented the Dose Planning Method (DPM) Monte Carlo dose calculation package (Sempau et al, Phys. Med. Biol., 45(2000)2263-2291) on GPU architecture under CUDA platform.</ShortDescription>
        <URL>http://arxiv.org/ftp/arxiv/papers/0910/0910.0329.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1263_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1263_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of California, San Diego</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>03/22/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Xun Jia</Author>
           <Author email="">Xuejun Gu</Author>
           <Author email="">Josep Sempau</Author>
           <Author email="">Dongju Choi</Author>
           <Author email="">Amitava Majumdar</Author>
           <Author email="">Steve B. Jiang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://arxiv.org/ftp/arxiv/papers/0910/0910.0329.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Xun Jia,Xuejun Gu,Josep Sempau,Dongju Choi,Amitava Majumdar,Steve B. Jiang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e41a3774-d96e-444e-a928-811d5f31b161</GUID>
        <Name>Performance Characterization of a GPU as a Ubiquitous Accelerator in Commodity Multiprocessor Systems</Name>
        <ShortDescription>Graphic processing units (GPUs) are increasingly being employed as commodity data-parallel co-processors in desktop and laptop systems due to their tremendous computational power as well as high memory bandwidth. A number of research efforts are focusing on the development of methodologies for efficient utilization of GPU hardware as a ubiquitous accelerator for CPU and memory intensive tasks to off-load the main processor(s). In order to effectively off-load parts of computation, developers need to have a clear understanding of performance trade-offs of using GPU as an accelerator for the host processor.</ShortDescription>
        <URL>http://www.kics.edu.pk/hpcnl/images/hpcnl_kics_tr_03.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1262_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1262_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Al-Khawarizmi Institute of Computer Science</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>06/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ghulam.mustafa@kics.edu.pk">Ghulam Mustafa</Author>
           <Author email="awaheed@kics.edu.pk">Abdul Waheed</Author>
           <Author email="director@kics.edu.pk">Waqar Mahmood</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.kics.edu.pk/hpcnl/images/hpcnl_kics_tr_03.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ghulam Mustafa,Abdul Waheed,Waqar Mahmood,ghulam.mustafa@kics.edu.pk,awaheed@kics.edu.pk,director@kics.edu.pk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>38d916ba-ce82-48eb-9159-6f33a4396526</GUID>
        <Name>Implementation of Stereophonic Acoustic Echo Canceller on nVIDIA GeForce Graphics Processing Unit</Name>
        <ShortDescription>This paper presents an implementation of a stereophonic acoustic echo canceller on NVIDIA GeForce graphics processor and CUDA software development environment. For efficiency, fast shared memory has been used as much as possible. A tree adder is introduced to reduce the cost for summing thread outputs up. The performance evaluation results suggest that Even a low-cost GPU's with a small number of shader processor greatly helps the echo cancellation for low-cost PCbased teleconferencing.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1261_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1261_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Kanazawa University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>12/07/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="hirano@t.kanazawa-u.ac.jp">Akihiro Hirano</Author>
           <Author email="nakayama@t.kanazawa-u.ac.jp">Kenji Nakayama</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://dspace.lib.kanazawa-u.ac.jp/dspace/bitstream/2297/24447/1/TE-PR-HIRANO-A-303.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Akihiro Hirano,Kenji Nakayama,hirano@t.kanazawa-u.ac.jp,nakayama@t.kanazawa-u.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>74329815-f8c3-4f35-92de-134ac83e4ada</GUID>
        <Name>Implementing Closed-Form Expressions on FPGAs Using the NAL, with Comparison to CUDA GPU and Cell BE Implementations</Name>
        <ShortDescription>This paper outlines the Nallatech Accelerator Layer (NAL) and its relationship to Intel's Accelerator Abstraction Layer. The NAL is looked at in its academic context. Hardware platforms that support the NAL are discussed: the Nallatech H101, the Intel FSB-FPGA Module and the BenOne PCIe. The Intel QuickAssist Technology initiative and its associated Accelerator Abstraction Layer (AAL) are introduced.</ShortDescription>
        <URL>http://www.rssi2008.org/proceedings/papers/posters/07_Bruce.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1260_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1260_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Nallatech Ltd</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>06/17/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Robin Bruce</Author>
           <Author email="">Javier Setoain</Author>
           <Author email="">Richard Chamberlain</Author>
           <Author email="">Malachy Devlin</Author>
           <Author email="">Rosa M. Badia</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.rssi2008.org/proceedings/papers/posters/07_Bruce.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Robin Bruce,Javier Setoain,Richard Chamberlain</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>64be2be4-6831-45f3-aa71-7ed3906590cc</GUID>
        <Name>MITHRA: Multiple data Independent Tasks on a Heterogeneous Resource Architecture</Name>
        <ShortDescription>With the advent of high-performance COTS clusters, there is a need for a simple, scalable and fault-tolerant parallel programming and execution paradigm. In this paper, we show that the popular MapReduce programming model can be utilized to solve many interesting scientific simulation problems with much higher performance than regular cluster computers by leveraging GPGPU accelerators in cluster nodes. We use the Massive Unordered Distributed (MUD) formalism and establish a one-to-one correspondence between it and general Monte
Carlo simulation methods.</ShortDescription>
        <URL>http://verma7.com/wp/wp-content/uploads/2009/09/CS597_Spring09_MITHRA_Technical_Report.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1259_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1259_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Illinois at Urbana-Champaign</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>08/25/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="farivar2@illinois.edu">Reza Farivar</Author>
           <Author email="verma7@illinois.edu">Abhishek Verma</Author>
           <Author email="emchan@illinois.edu">Ellick M. Chan</Author>
           <Author email="rhc@illinois.edu">Roy H. Campbell</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://verma7.com/wp/wp-content/uploads/2009/09/CS597_Spring09_MITHRA_Technical_Report.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Reza Farivar,Abhishek Verma,Ellick M. Chan,rhc@illinois.edu,Roy H. Campbell,farivar2@illinois.edu,verma7@illinois.edu,emchan@illinois.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>82928644-d96f-4ba8-9c46-0e6bf1e1f95e</GUID>
        <Name>Simulation of Reaction-Diffusion Processes in Three Dimensions using CUDA</Name>
        <ShortDescription>Numerical solution of reaction-diffusion equations in three dimensions is one of the most challenging applied mathematical problems. Since these simulations are very time consuming, any ideas and strategies aiming at the reduction of CPU time are important topics of research.  A general and robust idea is the parallelization of source codes/programs. Recently, the technological development of graphics hardware created a possibility to use desktop video cards to solve numerically intensive problems.</ShortDescription>
        <URL>http://arxiv.org/ftp/arxiv/papers/1004/1004.0480.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1257_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1257_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Eotvos Lorand University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>04/03/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Ferenc Molnar Jr</Author>
           <Author email="">Ferenc Izsak</Author>
           <Author email="">Robert Meszaros</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://arxiv.org/ftp/arxiv/papers/1004/1004.0480.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ferenc Molnar Jr,Ferenc Izsak,Robert Meszaros</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>171b372c-fb8e-4fc3-9a02-a3db231424a7</GUID>
        <Name>CUDASW++2.0: enhanced Smith-Waterman Protein Database Search on CUDA-Enabled GPUs Based on SIMT and Virtualized SIMD Abstractions</Name>
        <ShortDescription>Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models.</ShortDescription>
        <URL>http://www.biomedcentral.com/content/pdf/1756-0500-3-93.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1256_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1256_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nanyang Technological University, Singapore</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>04/14/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="liu0039@ntu.edu.sg">Yongchao Liu</Author>
           <Author email="">Bertil Schmidt</Author>
           <Author email="">Douglas L Maskell</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.biomedcentral.com/content/pdf/1756-0500-3-93.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yongchao Liu,Bertil Schmidt,Douglas L Maskell,liu0039@ntu.edu.sg</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>2733733f-bf02-44f0-92ce-49eed1ab150c</GUID>
        <Name>Design and Implementation of the Smith-Waterman Algorithm on the CUDA-Compatible GPU</Name>
        <ShortDescription>This paper describes a design and implementation of the Smith-Waterman algorithm accelerated on the graphics
processing unit (GPU). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip memory and processing elements in the GPU. Furthermore, it reduces the number of data fetches by applying a data reuse technique to query and database sequences.</ShortDescription>
        <URL>http://www-hagi.ist.osaka-u.ac.jp/research/papers/200810_y-munekw_bibe.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1255_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1255_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Osaka University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>09</ReleaseDay>
        <ReleaseDateDisplay>08/09/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="y-munekw@ist.osaka-u.ac.jp">Yuma Munekawa</Author>
           <Author email="ino@ist.osaka-u.ac.jp">Fumihiko Ino</Author>
           <Author email="">Kenichi Hagihara</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www-hagi.ist.osaka-u.ac.jp/research/papers/200810_y-munekw_bibe.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yuma Munekawa,Fumihiko Ino,Kenichi Hagihara,y-munekw@ist.osaka-u.ac.jp,ino@ist.osaka-u.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5f961830-f9db-421a-b7b6-cd749907f46e</GUID>
        <Name>Tapping the Supercomputer Under Your Desk: Solving Dynamic Equilibrium Models with Graphics Processors</Name>
        <ShortDescription>This paper shows how to build algorithms that use graphics processing units (GPUs) installed in most modern computers to solve dynamic equilibrium models in economics. In particular, we rely on the compute unifed device architecture (CUDA) of NVIDIA GPUs. We illustrate the power of the approach by solving a simple real business cycle model with value function iteration. We document improvements in speed of around 200 times and suggest that even further gains are likely.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1254_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1254_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Duke University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>04/10/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ealdrich@gmail.com">Eric M. Aldrich</Author>
           <Author email="jesusfv@econ.upenn.edu">Jesus Fernandez-Villaverde</Author>
           <Author email="aronaldg@gmail.com">A. Ronald Gallant</Author>
           <Author email="jfr23@duke.edu">Juan F. Rubio-Ramirez</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ssc.upenn.edu/~jesusfv/GPU_Computing.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Eric M. Aldrich,Jesus Fernandez-Villaverde,A. Ronald Gallant,ealdrich@gmail.com,jesusfv@econ.upenn.edu,aronaldg@gmail.com,Juan F. Rubio-Ramirez,jfr23@duke.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c1f6075d-5f89-44d5-938b-f527b8a56825</GUID>
        <Name>Faster Matrix-Vector Multiplication on GeForce 8800GTX</Name>
        <ShortDescription>Recently a GPU has acquired programmability to perform general purpose computation fast by running ten thousands of threads concurrently. This paper presents a new algorithm for dense matrix-vector multiplication on NVIDIA CUDA architecture. The experimental results on GeForce 8800GTX show that the proposed algorithm runs maximum 15.69 (resp., 32.88) times faster than the sgemv routine in NVIDIA's BLAS library CUBLAS 1.1 (resp., Intel Xeon E5335 CPU with SSE3 SIMD instructions) for matrices with order 16 to 12800.</ShortDescription>
        <URL>http://ch.nvidia.com/docs/IO/47905/fujimoto_lspp2008.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1253_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1253_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Osaka University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>01/29/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>15</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="fujimoto@ist.osaka-u.ac.jp">Noriyuki Fujimoto</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://ch.nvidia.com/docs/IO/47905/fujimoto_lspp2008.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Noriyuki Fujimoto,fujimoto@ist.osaka-u.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3939b0a0-52a9-48e0-8b31-6af60eae6ce6</GUID>
        <Name>Stackless KD-Tree Traversal for High Performance GPU Ray Tracing</Name>
        <ShortDescription>Significant advances have been achieved for realtime ray tracing recently, but realtime performance for complex scenes still requires large computational resources not yet available from the CPUs in standard PCs. Incidentally, most of these PCs also contain modern GPUs that do offer much larger raw compute power. However, limitations in the programming and memory model have so far kept the performance of GPU ray tracers well below that of their CPU counterparts.</ShortDescription>
        <URL>http://www.gpucomputing.net/?q=node/1293</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1252_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1252_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Saarland University and MPI Informatik</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>06/11/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Stefan Popov</Author>
           <Author email="">Johannes Gunther</Author>
           <Author email="">Hans-Peter Seidel</Author>
           <Author email="">Philipp Slusallek</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.gpucomputing.net/?q=node/1293">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Stefan Popov,Johannes Gunther,Hans-Peter Seidel</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c747f985-752f-4f32-be50-cf24898c527b</GUID>
        <Name>Fast Parallel GPU-Sorting Using a Hybrid Algorithm</Name>
        <ShortDescription>This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achieves high speed by efficiently utilizing the parallelism of the GPU throughout the whole algorithm. Initially, a parallel bucketsort splits the list into enough sublists then to be sorted in parallel using merge-sort. The parallel bucketsort, implemented in NVIDIA's CUDA, utilizes the synchronization mechanisms, such as atomic increment, that is available on modern GPUs. </ShortDescription>
        <URL>http://www.gpucomputing.net/?q=node/1291</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1251_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1251_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Chalmers University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>09/25/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="erik.sintorn@chalmers.se">Erik Sintorn</Author>
           <Author email="uffe@chalmers.se">Ulf Assarsson</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.gpucomputing.net/?q=node/1291">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Erik Sintorn,Ulf Assarsson,erik.sintorn@chalmers.se,uffe@chalmers.se</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>01c04032-f48d-40e7-ba6f-105e9e541977</GUID>
        <Name>Testing the Feasibility of Running a Computationally Intensive Real-Time Traffic Simulation on a Multicore Programmable Graphics Processor</Name>
        <ShortDescription>In the 1960s, a semiconductor scientist named Gordon Moore theorized that the number of transistors would double each year on a single integrated circuit. Through much effort, the semiconductor industry has been able to closely follow "Moore's Law", but new information shows this type of progress is not sustainable in the coming years.  This realization has implications in both chip fabrication and software development.
Instead of making chips with more transistors per unit area, industry now produces newer multicore chips.</ShortDescription>
        <URL>http://www.gpucomputing.net/?q=node/603</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1250_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1250_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Virginia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>04</ReleaseDay>
        <ReleaseDateDisplay>04/04/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Kevin Stammetti</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.gpucomputing.net/?q=node/603">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Kevin Stammetti</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>18ea0385-7666-484e-abed-98ef81d2697a</GUID>
        <Name>A Flexible High-Performance Lattice Boltzmann GPU Code for the Simulations of Fluid Flows in Complex Geometries</Name>
        <ShortDescription>We describe the porting of the Lattice Boltzmann component of MUPHY, a multi-physics/scale simulation
software, to multiple graphics processing units using the Compute Unified Device Architecture. The novelty
of this work is the development of ad hoc techniques for optimizing the indirect addressing that MUPHY
uses for efficient simulations of irregular domains.</ShortDescription>
        <URL>http://www.iac.rm.cnr.it/~massimo/Papers/LBEonGPU.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1247_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1247_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>1 Istituto Applicazioni Calcolo, 2 NVIDIA, 3 SOFT, Istituto Nazionale Fisica della Materia, 4Harvard University School of Eng and Applied Sciences, 5 Harvard University Initiative in Innovative Computing</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>05/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Massimo Bernaschi1</Author>
           <Author email="">Massimiliano Fatica2</Author>
           <Author email="">Simone Melchionna3,4</Author>
           <Author email="">Sauro Succi1,5</Author>
           <Author email="">Efthimios Kaxiras4</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.iac.rm.cnr.it/~massimo/Papers/LBEonGPU.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Massimo Bernaschi,Massimiliano Fatica,Simone Melchionna,Sauro Succi1,Efthimios Kaxiras</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>74ce7180-6c27-4cf9-906a-507feae7f418</GUID>
        <Name>GPU Clusters for High-Performance Computing</Name>
        <ShortDescription>Large-scale GPU clusters are gaining popularity in the scientific computing community. However, their deployment and production use are associated with a number of new challenges. In this paper, we present our efforts to address some of the challenges with building and running GPU clusters in HPC environments. We touch upon such issues as balanced cluster architecture, resource sharing in a cluster environment, programming models, and applications for GPU clusters.</ShortDescription>
        <URL>http://www.ncsa.illinois.edu/~gshi/ppac09_paper.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1246_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1246_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Illinois at Urbana-Champaign</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>08/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="kindr@ncsa.uiuc.edu">Volodymyr V. Kindratenko</Author>
           <Author email="jenos@ncsa.uiuc.edu">Jeremy J. Enos</Author>
           <Author email="gshi@ncsa.uiuc.edu">Guochun Shi</Author>
           <Author email="mshow@ncsa.uiuc.edu">Michael T. Showerman</Author>
           <Author email="arnoldg@ncsa.uiuc.edu">Galen W. Arnold</Author>
           <Author email="johns@ks.uiuc.edu">John E. Stone</Author>
           <Author email="jim@ks.uiuc.edu">James C. Phillips</Author>
           <Author email="hwu@crhc.uiuc.edu">Wen-mei Hwu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ncsa.illinois.edu/~gshi/ppac09_paper.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Volodymyr V. Kindratenko,Jeremy J. Enos,Guochun Shi,kindr@ncsa.uiuc.edu,jenos@ncsa.uiuc.edu,gshi@ncsa.uiuc.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ed2b492f-10c9-4c78-9acc-215d2de0900e</GUID>
        <Name>Towards User Transparent Parallel Multimedia</Name>
        <ShortDescription>The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia archives and data streams. To satisfy the increasing computational demands of MMCA problems, the use of High Performance Computing (HPC) techniques is essential. As most MMCA researchers are not HPC experts, there is an urgent need for 'familiar' programming models and tools that are both easy to use and efficient.</ShortDescription>
        <URL>http://hal.inria.fr/docs/00/49/38/83/PDF/A4MMC-werkhoven.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1244_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1244_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>VU University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>06/21/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="bjvwerkh@few.vu.nl">Ben van Werkhoven</Author>
           <Author email="jason@few.vu.nl">Jason Maassen</Author>
           <Author email="fjseins@vew.vu.nl">Frank J. Seinstra</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://hal.inria.fr/docs/00/49/38/83/PDF/A4MMC-werkhoven.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ben van Werkhoven,Jason Maassen,Frank J. Seinstra,bjvwerkh@few.vu.nl,jason@few.vu.nl,fjseins@vew.vu.nl</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a4825b5f-3099-4dcc-8335-ba0f18a6a351</GUID>
        <Name>Axel: A Heterogeneous Cluster with FPGAs and GPUs</Name>
        <ShortDescription>This paper describes a heterogeneous computer cluster called Axel. Axel contains a collection of nodes; each node can include multiple types of accelerators such as FPGAs (Field Programmable Gate Arrays) and GPUs (Graphics Processing Units). A Map-Reduce framework for the Axel cluster is presented which exploits spatial and temporal locality through different types of processing elements and communication channels. The Axel system enables the first demonstration of FPGAs, GPUs and CPUs running collaboratively for N-body simulation. </ShortDescription>
        <URL>http://portal.acm.org/citation.cfm?id=1723112.1723134#abstract</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1243_logo_acm_portal2_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1243_logo_acm_portal2_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Imperial College London</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>02/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>23</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Kuen Hung Tsoi</Author>
           <Author email="">Wayne Luk</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://portal.acm.org/citation.cfm?id=1723112.1723134#abstract">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Kuen Hung Tsoi,Wayne Luk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f6a0332b-f8ec-4d4d-9551-0f7fc57ad735</GUID>
        <Name>GPU-based Hierarchical Computations for View Independent Visibility</Name>
        <ShortDescription>With rapid improvements in the performance and programmability, Graphics Processing Units (GPUs) have fostered considerable interest in substantially reducing the running time of compute intensive problems. The solution to the view-independent mutual point-pair visibility problem (required for inter-reflections in global illumination) can, it would seem, require the capabilities of the GPUs.</ShortDescription>
        <URL>http://dspace.library.iitb.ac.in/jspui/bitstream/10054/1708/1/4756034.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1242_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1242_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Indian Institute of Technology Bombay</OrganizationName>
        <OrganizationURL>www.cse.iitb.ac.in</OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>12/16/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Rhushabh Goradia</Author>
           <Author email="">Prekshu Ajmera</Author>
           <Author email="">Sharat Chandran</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://dspace.library.iitb.ac.in/jspui/bitstream/10054/1708/1/4756034.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Rhushabh Goradia,Prekshu Ajmera,Sharat Chandran</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e3c99de6-a223-4437-af51-01bd3faf3026</GUID>
        <Name>Fast, GPU-based Diffuse Global Illumination For Point Models</Name>
        <ShortDescription>Photorealistic computer graphics attempts to match as closely as possible the rendering of a virtual scene
with an actual photograph of the scene had it existed in the real world. Of the several techniques that are used to achieve this goal, physically-based approaches (i.e. those that attempt to simulate the actual physical process of illumination) provide the most striking results.</ShortDescription>
        <URL>http://www.cse.iitb.ac.in/~rhushabh/aps/aps4/report.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1241_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1241_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Indian Institute of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>08/26/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Rhushabh Goradia</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cse.iitb.ac.in/~rhushabh/aps/aps4/report.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Rhushabh Goradia</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5c93da6a-193d-4595-998d-07313764eef5</GUID>
        <Name>Fast GPU-based Adaptive Tessellation with CUDA</Name>
        <ShortDescription>Compact surface descriptions like higher-order surfaces are popular representations for both modeling and animation.  However, for fast graphics-hardware-assisted rendering, they usually need to be converted to triangle meshes. In this paper, we introduce a new framework for performing on-the-fly crack-free adaptive tessellation of surface primitives completely on the GPU. Utilizing CUDA and its flexible memory write capabilities, we parallelize the tessellation task at the level of single surface primitives.</ShortDescription>
        <URL>https://www.mpi-sb.mpg.de/~mschwarz/papers/cudatess-eg09.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1240_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1240_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>University of Erlangen-Nuremberg</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>04/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Michael Schwarz</Author>
           <Author email="">Marc Stamminger</Author>
        </Authors>
        <ContentTypes>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Michael Schwarz,Marc Stamminger</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>5eb012b6-230e-49e0-ac77-6be5c40b011a</GUID>
        <Name>Using Graphics Devices in Reverse: GPU-based Image Processing and Computer Vision</Name>
        <ShortDescription>Graphics and vision are approximate inverses of each other. Ordinarily Graphics Processing Units are used to convert "numbers into pictures" (i.e. computer graphics).  In this paper, we discus the use of GPUs in approximately the reverse way to assist in "converting pictures into numbers" (i.e. computer vision).  For graphical operations, GPUs currently provide many hundreds of gigaflops of processing power.</ShortDescription>
        <URL>http://www.uweb.ucsb.edu/~yichuwang/ecv/paper/using_graphics_device_in_reverse.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1239_gpucomputing_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1239_gpucomputing_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>NVIDIA Corporation</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>08/26/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">James Fung</Author>
           <Author email="">Steve Mann</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.uweb.ucsb.edu/~yichuwang/ecv/paper/using_graphics_device_in_reverse.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>James Fung,Steve Mann</Keyword>
        </Keywords>
     </Application>
     <Application>
        <GUID>1bbe2f89-4dd9-4051-9bf0-bcd1abdc7a03</GUID>
        <Name>CUDA SURF - A real-time implementation for SURF</Name>
        <ShortDescription>Keypoint detection and matching is a basic computer vision task and a necessary ingredient for several applications, e.g., object recognition, structure from motion, panorama stitching. In this work we implement the popular SURF descriptor, an approximation of SIFT, on commodity graphics hardware and achieve real-time performance even for HD images. For VGA images we achieve a speed-up of about 50x and a GTX 285 and for HD images even up to 87x.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1238_633214_match_GPU_graff_img1_img2_small_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1238_633214_match_GPU_graff_img1_img2_small_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>TU Darmstadt</OrganizationName>
        <OrganizationURL>http://www.tu-darmstadt.de</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>13</ReleaseDay>
        <ReleaseDateDisplay>07/13/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>87</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="wojek@cs.tu-darmstadt.de">Andre Schulz</Author>
           <Author email="">Florian Jung</Author>
           <Author email="">Sebastian Hartte</Author>
           <Author email="">Daniel Trick</Author>
           <Author email="">Christian Wojek</Author>
           <Author email="">Konrad Schindler</Author>
           <Author email="">Jens Ackermann</Author>
           <Author email="">Michael Goesele</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.mis.tu-darmstadt.de/surf">Application</ContentType>
           <ContentType url="http://www.mis.tu-darmstadt.de/surf">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Keypoint detection, Keypoint description, SURF, Object detection, SfM, Image stitching,Andre Schulz,Florian Jung,Sebastian Hartte, Daniel Trick,Christian Wojek, Konrad Schindler, Jens Ackermann, Michael Goesele,wojek@cs.tu-darmstadt.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fdd8f407-1003-4323-82d5-f91b0c483a18</GUID>
        <Name>A Real-Time Multigrid Finite Hexahedra Method for Elasticity Simulation using CUDA</Name>
        <ShortDescription>We present an efficient CUDA implementation of a finite hexahedra multigrid solver for simulating elastic deformable models in real time. Due to the regular shape of the numerical stencil induced by the hexahedral regime, computations and data layout can be restructured to avoid execution divergence and to support memory access patterns enabling the hardware to coalesce multiple memory accesses into single memory transactions. This enables to effectively exploit the GPU's parallel processing units and high memory bandwidth. Performance gains of up to a factor of 12 compared to a highly optimized CPU implementation are demonstrated.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1237_170404_VoxelModel_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1237_170404_VoxelModel_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Computer Graphics and Visualization Group, Technische Universitat Munchen, Germany</OrganizationName>
        <OrganizationURL>http://wwwcg.in.tum.de/</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>07/10/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>12</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="dick@tum.de">Christian Dick</Author>
           <Author email="georgii@tum.de">Joachim Georgii</Author>
           <Author email="westermann@tum.de">Rudiger Westermann</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://wwwcg.in.tum.de/Research/Publications/CompMechanics">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Deformable Objects, Finite Element Methods, Multigrid,Christian Dick,Joachim Georgii,Rudiger Westermann</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1cbf9658-009b-4ee7-81fe-1f144a456225</GUID>
        <Name>Real-time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid</Name>
        <ShortDescription>We introduce a real-time stereo matching technique based on a reformulation of Yoon and Kweons adaptive support weights algorithm [1]. Our implementation uses the bilateral grid to achieve a speedup of 200x compared to a straightforward full-kernel GPU implementation, which in turn is 20x faster than the original CPU implementation, thus making it the fastest technique on the Middlebury website. Published at the European Conference on Computer Vision (ECCV) 2010.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1236_167313_DCBGrid-teaser_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1236_167313_DCBGrid-teaser_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Cambridge and Microsoft Research Cambridge</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>06/24/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>200</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="christian.richardt@cl.cam.ac.uk">Christian Richardt</Author>
           <Author email="">Douglas Orr</Author>
           <Author email="">Ian Davies</Author>
           <Author email="">Antonio Criminisi</Author>
           <Author email="">Neil A. Dodgson</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cl.cam.ac.uk/research/rainbow/projects/dcbgrid/supplement/DCBGrid-skydiving-comparison.avi">Multimedia</ContentType>
           <ContentType url="http://www.cl.cam.ac.uk/research/rainbow/projects/dcbgrid/">Paper</ContentType>
           <ContentType url="http://www.cl.cam.ac.uk/research/rainbow/projects/dcbgrid/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Video &amp; Audio</ApplicationType>
           <ApplicationType>Computer Vision</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Christian Richardt,Douglass Orr,Ian Davies, Antonio Criminisi, Neil A. Dodgson,christian.richardt@cl.cam.ac.uk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>bbaba1fd-20be-44f9-9b64-fea2cd12fce2</GUID>
        <Name>Realtime Tracking With a Pan-Tilt Camera</Name>
        <ShortDescription>The human eye is amazingly adept at tracking moving objects. The process is so natural to humans that it happens without any conscious effort. While this remarkable ability depends in part on the human brain's immense processing power, the fast response of the extraocular muscles and the eyeball's light weight are also vital. Even a small point and shoot camera mounted on a servo is typically too heavy and slow to move with the agility of the human eye. How, then, can we give a computer the ability to track movement quickly and responsively? </ShortDescription>
        <URL>http://umassgv.blogspot.com/2010/07/realtime-tracking-with-pan-tilt-camera.html</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1235_123697_tracking_overview_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1235_123697_tracking_overview_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Massachusetts, Amherst</OrganizationName>
        <OrganizationURL>http://www.cs.umass.edu</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>06</ReleaseDay>
        <ReleaseDateDisplay>07/06/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="blfoster@cs.umass.edu">Blake Foster</Author>
           <Author email="ruiwang@cs.umass.edu">Rui Wang</Author>
           <Author email="elm@cs.umass.edu">Erik Learned-Miller</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.youtube.com/watch?v=8luy8jP1UNs">Multimedia</ContentType>
           <ContentType url="http://umassgv.blogspot.com/2010/07/realtime-tracking-with-pan-tilt-camera.html">Paper</ContentType>
           <ContentType url="http://umassgv.blogspot.com/2010/07/realtime-tracking-with-pan-tilt-camera.html">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>tracking, camera, fpv, pan tilt, human eye, vision,Blake Foster,Rui Wang,Erik Learned-Miller,blfoster@cs.umass.edu,ruiwang@cs.umass.edu,elm@cs.umass.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>749bb40e-ff99-4a34-9093-c8ff4b7ab08d</GUID>
        <Name>Thrust Graph Library</Name>
        <ShortDescription>Thrust Graph Library provides graph container, algorithm, and other concepts like a Boost Graph Library. This Library based on the thrust, which is a CUDA library of parallel algorithms with an interface resembling the C++ Standard Template Library (STL). </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1234_53418_networks_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1234_53418_networks_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>National Institute of Advanced Industrial Science and Technology (AIST)</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>06</ReleaseDay>
        <ReleaseDateDisplay>07/06/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="k.kojima@aist.go.jp">kazuhiro kojima</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/thrust-graph/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Libraries</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Graph Library,kazuhiro kojima,k.kojima@aist.go.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>aafa92f6-e4db-4ff5-bbf2-093cef1271ea</GUID>
        <Name>Modeling Rotor Wakes with a Hybrid OVERFLOW-Vortex Method on a GPU Cluster</Name>
        <ShortDescription>The vortex core shed from rotorcraft blades maintains coherency---and thus dynamic relevance---many blade turns after its creation. This presents a challenge to traditional Eulerian computational methods, as fine grids are required to suppress numerical diffusion which would weaken the vortex cores after a small number of revolutions. Vortex methods have been used in the past to overcome these problems, as they require computational elements only in vorticity-containing regions, but suffer from greater computational cost per element. </ShortDescription>
        <URL>http://markjstock.org/research/AIAA-2010-4553.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1233_86624_4bladed_720_web2_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1233_86624_4bladed_720_web2_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Applied Scientific Research, Inc.</OrganizationName>
        <OrganizationURL>http://www.applied-scientific.com/</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>06/29/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="mstock@applied-scientific.com">Mark J. Stock</Author>
           <Author email="">Adrin Gharakhani</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.youtube.com/watch?v=LL0p2SJv7yc">Multimedia</ContentType>
           <ContentType url="http://markjstock.org/research/AIAA-2010-4553.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>cfd rotor helicopter vortex fluid,Mark J. Stock,Adrin Gharakhani,mstock@applied-scientific.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5eb5fa1a-f845-4b64-863f-48f16aae06bb</GUID>
        <Name>A GPU-accelerated Boundary Element Method and Vortex Particle Method</Name>
        <ShortDescription>Vortex particle methods, when combined with multipole-accelerated boundary element methods (BEM), become a complete tool for direct numerical simulation (DNS) of internal or external vortex-dominated flows. In previous work, we presented a method to accelerate the vorticity-velocity inversion at the heart of vortex particle methods by performing a multipole treecode N-body method on parallel graphics hardware. The resulting method achieved a 17-fold speedup over a dual-core CPU implementation.</ShortDescription>
        <URL>http://markjstock.org/research/AIAA-2010-5099.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1232_266408_spheres_cl_vort_crop_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1232_266408_spheres_cl_vort_crop_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Applied Scientific Research, Inc.</OrganizationName>
        <OrganizationURL>http://applied-scientific.com/</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>07/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>43</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="mstock@applied-scientific.com">Mark J. Stock</Author>
           <Author email="adrin@applied-scientific.com">Adrin Gharakhani</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://markjstock.org/research/AIAA-2010-5099.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>cfd vortex nbody bem fluid,Mark J. Stock,mstock@applied-scientific.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c7ec9d53-04d1-4b50-9a6a-0f92323dce34</GUID>
        <Name>Leukocyte Tracking: ImageJ Plugin</Name>
        <ShortDescription>This software is a plugin for the ImageJ image processing program. The plugin is designed to detect and track rolling leukocytes (white blood cells) through multiple frames of video. It can take advantage of a CUDA-capable GPU to dramatically accelerate video processing time; with appropriate hardware, near real-time processing can be achieved.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1231_26570_leukocytes_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1231_26570_leukocytes_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Virginia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>07/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>26</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="boyer@cs.virginia.edu">Michael Boyer</Author>
           <Author email="">David Tarjan</Author>
           <Author email="">Scott T. Acton</Author>
           <Author email="">Kevin Skadron</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cs.virginia.edu/~mwb7w/leukocyte/">Application</ContentType>
           <ContentType url="http://www.cs.virginia.edu/~mwb7w/leukocyte/">Multimedia</ContentType>
           <ContentType url="http://www.cs.virginia.edu/~mwb7w/leukocyte/">Paper</ContentType>
           <ContentType url="http://www.cs.virginia.edu/~mwb7w/leukocyte/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Medical Imaging</ApplicationType>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>leukocyte, blood cell, tracking, video,Michael Boyer,David Tarjan,Scott T. Acton, Kevin Skadron,boyer@cs.virginia.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d36b8ae0-3465-4b14-82f6-07712472e3ae</GUID>
        <Name>McStas CUDA optimization project</Name>
        <ShortDescription>Optimize the single crystal component of McStas neutron raytracer using CUDA</ShortDescription>
        <URL>http://www.mcstas.org/</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1229_6226_logo-left_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1229_6226_logo-left_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>eScience Center, University of Copenhagen</OrganizationName>
        <OrganizationURL>http://www.escience.ku.dk</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>01/29/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>125</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="jesper.dahlkild@gmail.com">Jesper Dahlkild</Author>
           <Author email="djurnoe@diku.dk">Martin Djurno</Author>
           <Author email="fk@finnkrog.com">Finn Krog</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.nvidia.com/content/cudazone/CUDABrowser/downloads/papers/McCuda.pdf">Paper</ContentType>
           <ContentType url="http://www.nvidia.com/content/cudazone/CUDABrowser/downloads/code/sources_and_testdata.zip">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Ray Tracing</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jesper Dahlkild,Martin Djurno,Finn Krog,jesper.dahlkild@gmail.com,djurnoe@diku.dk,fk@finnkrog.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c15f6620-26e7-4f91-b857-46380ff67782</GUID>
        <Name>Raytracing in participating media</Name>
        <ShortDescription>This work presents a CUDA-accelerated algorithm for visualization of photorealistic lighting effects which is based on Henrik Wann Jensen's method for global illumination in scenes with participating media.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1228_58732_logo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1228_58732_logo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Wroclaw University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>07/02/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="piotr.orzechowski@gmail.com">Piotr Orzechowski</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.xp-dev.com/wiki/52927/Homepage">Application</ContentType>
           <ContentType url="http://www.xp-dev.com/wiki/52927/Homepage">Multimedia</ContentType>
           <ContentType url="http://www.xp-dev.com/wiki/52927/Homepage">Paper</ContentType>
           <ContentType url="http://www.xp-dev.com/wiki/52927/Homepage">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Ray Tracing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>raytracing, participating media, photon mapping,Piotr Orzechowski,piotr.orzechowski@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d35e353a-09c4-47df-ba61-10984823be50</GUID>
        <Name>Multi-domain, Higher Order Level Set Scheme for 3D Image Segmentation on the GPU</Name>
        <ShortDescription>A streaming level set PDE solver to handle large volume ( sizes more than the available GPU memory). A higher order and multi-phase solver for smooth segmentation of the volume.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1227_545525_Slide2_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1227_545525_Slide2_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>The Technical University of Denmark / The University of Texas at Austin</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>06/16/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="os@imm.dtu.dk">Ojaswa Sharma</Author>
           <Author email="zqyork@ices.utexas.edu"> Qin Zhang</Author>
           <Author email="fa@imm.dtu.dk">Francois Anton</Author>
           <Author email="bajaj@cs.utexas.edu">Chandrajit Bajaj </Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://cvcweb.ices.utexas.edu/ccv/projects/project.php?proID=9">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ojaswa Sharma, Qin Zhang,Francois Anton, and Chandrajit Bajaj ,os@imm.dtu.dk,zqyork@ices.utexas.edu,fa@imm.dtu.dk, bajaj@cs.utexas.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>082dbdd8-2eb2-4511-9072-d5eff768e420</GUID>
        <Name>A Simple Pseudo-Random Number Generator</Name>
        <ShortDescription>Implementation of uniformly and normally distributed pseudo random number generators as device functions.</ShortDescription>
        <URL>http://people.virginia.edu/~mjt5v/pf/RNG/</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1226_144550627858cb6d44ceb02ba9434317_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1226_144550627858cb6d44ceb02ba9434317_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Virginia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>06/08/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="mjt5v@virginia.edu">Michael Trotter</Author>
           <Author email="mag6x@virginia.edu">Matt Goodrum</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://people.virginia.edu/~mjt5v/pf/RNG/RNG.zip">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Programming Tools</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Michael Trotter,Matt Goodrum,mjt5v@virginia.edu,mag6x@virginia.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9e6bae6c-dc38-44dd-863a-441c9440bf9e</GUID>
        <Name>High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster</Name>
        <ShortDescription>We implement a high-order finite-element application, which performs the numerical simulation of seismic wave propagation resulting for instance from earthquakes at the scale of a continent or from active seismic acquisition experiments in the oil industry, on a large cluster of NVIDIA Tesla graphics cards using the CUDA programming environment and non-blocking message passing based on MPI. Contrary to many finite-element implementations, ours is implemented successfully in single precision, maximizing the performance of current generation GPUs. We discuss the implementation and optimization of the code and compare it to an existing very optimized implementation in C language and MPI on a classical cluster of CPU nodes. We use mesh coloring to efficiently handle summation operations over degrees of freedom on an unstructured mesh, and non-blocking MPI messages in order to overlap the communications across the network and the data transfer to and from the device via PCIe with calculations on the GPU. We perform a number of numerical tests to validate the single-precision CUDA and MPI implementation and assess its accuracy. We then analyze performance measurements and depending on how the problem is mapped to the reference CPU cluster, we obtain a speedup of 20x or 12x. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1225_20831_seismic_paper_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1225_20831_seismic_paper_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universite de Pau (France), Florida State University (US,) TU Dortmund (Germany)</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>06/15/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>20</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="dimitri.komatitsch@univ-pau.fr">Dimitri Komatitsch</Author>
           <Author email="">Gordon Erlebacher</Author>
           <Author email="">Dominik Goddeke</Author>
           <Author email="">David Michea</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://web.univ-pau.fr/~dkomati1/published_papers/JCP_multiGPUs_2010.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Oil &amp; Gas</ApplicationType>
           <ApplicationType>Clusters</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Dimitri Komatitsch,Gordon Erlebacher,Dominik Goddeke,David Michea,dimitri.komatitsch@univ-pau.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5f003419-e86a-4d8b-9d19-997acd44898b</GUID>
        <Name>To GPU Synchronize or Not GPU Synchronize?</Name>
        <ShortDescription>The graphics processing unit (GPU) has evolved from being a fixed function processor with programmable stages into a programmable processor with many fixed function components that deliver massive parallelism. By modifying the GPUs stream processor to support general-purpose computation on the GPU (GPGPU), applications that perform massive vector operations can realize many orders of magnitude improvement in performance over a traditional processor, i.e., CPU.</ShortDescription>
        <URL>https://research.cs.vt.edu/synergy/pubs/papers/feng-iscas2010-gpusync.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1224_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1224_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Virginia Tech</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>03/28/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="wfeng@vt.edu">Wu-chun Feng</Author>
           <Author email="shucaig@vt.edu">Shucai Xiao</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="https://research.cs.vt.edu/synergy/pubs/papers/feng-iscas2010-gpusync.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Wu-chun Feng,Shucai Xiao,wfeng@vt.edu,shucaig@vt.edu</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>aed6944b-13eb-478c-874a-751a332dad9d</GUID>
        <Name>FATSEA An Architectural Simulator for General Purpose Computing on GPUs</Name>
        <ShortDescription>We present FATSEA, a functional and performance evaluation simulator written in C++ to handle kernels written in the CUDA programming language aimed for GPGPU computing. FATSEA takes a Parallel Thread eXecution (PTX) code as input, which is a device independent code format generated by the Nvidia CUDA compiler, to validate results and estimate performance on Nvidia platforms.</ShortDescription>
        <URL>http://ditec.um.es/~jlaragon/papers/FATSEA-RAPIDO-2010.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1223_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1223_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Murcia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>12/22/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">K. E. Ostby</Author>
           <Author email="">J. L. Aragon</Author>
           <Author email="">J. M. Garcia</Author>
           <Author email="">M. Ujaldon</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://ditec.um.es/~jlaragon/papers/FATSEA-RAPIDO-2010.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>K. E. Ostby,J. L. Aragon,J. M. Garcya</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0c25d963-909c-4a1e-8f60-81bf8fd868cf</GUID>
        <Name>Evaluating the use of GPUs in Liver Image Segmentation and HMMER Database Searches</Name>
        <ShortDescription>In this paper we present the results of parallelizing two life sciences applications, Markov random fieldsbased (MRF) liver segmentation and HMMER's Viterbi algorithm, using GPUs.  We relate our experiences in porting both applications to the GPU as well as the techniques and optimizations that are most beneficial.   The unique characteristics of both algorithms are demonstrated by implementations on an NVIDIA 8800 GTX Ultra using the CUDA programming environment. We test multiple enhancements in our GPU kernels in order to demonstrate the effectiveness of each strategy.</ShortDescription>
        <URL>http://cadi.buffalo.edu/papers/2009/2009_4.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1222_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1222_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University at Buffalo, SUNY</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>02/15/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="waltersj@buffalo.edu">John Paul Walters</Author>
           <Author email="vbalu2@buffalo.edu">Vidyananth Balu</Author>
           <Author email="kompalli@hp.com">Suryaprakash Kompalli</Author>
           <Author email="vipin@buffalo.edu">Vipin Chaudhary</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://cadi.buffalo.edu/papers/2009/2009_4.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>John Paul Walters,Vidyananth Balu,Suryaprakash Kompalli,waltersj@buffalo.edu,vbalu2@buffalo.edu,kompalli@hp.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e21e7880-81bf-46d6-9185-b56a93a0ad3b</GUID>
        <Name>GPUCT: A GPU-Accelerated CT Reconstruction System</Name>'        <ShortDescription>CT scanning is a medical imaging technique commonly used in hospitals, including the University of Virginia Hospital, to see inside the human body. Modern CT scanners can generate images of the body in three dimensions, a process called 3D reconstruction. This project illustrates the feasibility of using graphics hardware (GPUs) to process CT scans in a more efficient and inexpensive manner than current commercial reconstruction systems. Additionally, this research considers the ethical and social implications of an improved CT reconstruction system in terms of risks for hospitals and patients.</ShortDescription>
        <URL>http://www.cs.virginia.edu/~skadron/Papers/maier_thesis07.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1221_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1221_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Virginia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>03/30/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>56</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Drew Maier</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cs.virginia.edu/~skadron/Papers/maier_thesis07.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Drew Maier</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>72ba00af-f128-438b-9e0f-379786953cce</GUID>
        <Name>FPGA-Based Hardware Acceleration of Lithographic Aerial Image Simulation</Name>
        <ShortDescription>Lithography simulation, an essential step in design for manufacturability (DFM), is still far from computationally efficient. Most leading companies use large clusters of server computers to achieve acceptable turn-around time. Thus coprocessor acceleration is very attractive for obtaining increased computational performance with a reduced power consumption. This article describes the implementation of a customized accelerator on FPGA using a polygon-based simulation model.  An application-specific memory partitioning scheme is designed to meet the bandwidth requirements for a large number of processing elements.</ShortDescription>
        <URL>http://cadlab.cs.ucla.edu/~cong/papers/TRETS-17.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1220_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1220_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of California, Los Angeles</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>09/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>15</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Jason Congyi</Author>
           <Author email="">Yi Zou</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://cadlab.cs.ucla.edu/~cong/papers/TRETS-17.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jason Congyi,Yi Zou</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ebfbcda7-6930-45cc-83d5-1414da4c1325</GUID>
        <Name>Visualising Spins and Clusters in Regular and Small-World Ising Models with GPUs</Name>
        <ShortDescription>Visualising computational simulation models of solid state physical systems is a hard problem for dense lattice models. Fly throughs and cutaways can aid viewer understanding of a simulated system. Interactive time model parameter updates and overlaying of measurements and graticules, cluster colour labelling and other visual highlighting cues can also enhance user intuition of the model's meaning. We present some graphical and simulation optimisation techniques and various graphical rendering and explanatory techniques for computational simulation models such as the Ising model in 2 and 3 dimensions. In addition to aiding understanding of conventional algorithms such as Metropolis Monte Carlo, we try to visualise cluster updates to the system using algorithms like that of Wolff.</ShortDescription>
        <URL>http://www.massey.ac.nz/~dpplayne/Papers/cstn-108.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1219_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1219_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Massey University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>03/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">A. Leist</Author>
           <Author email="">D. P. Playne</Author>
           <Author email="">K.A. Hawick</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.massey.ac.nz/~dpplayne/Papers/cstn-108.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>A. Leist,D. P. Playne,K.A. Hawick</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3c9c3b66-7ebb-461e-884c-c820abca8856</GUID>
        <Name>Stereo Depth with a Unified Architecture GPU</Name>
        <ShortDescription>This paper describes how the calculation of depth from stereo images was accelerated using a GPU. The Compute
Unified Device Architecture (CUDA) from NVIDIA was employed in novel ways to compute depth using BT cost matching and the Semi-Global Matching algorithm. The challenges of mapping a sequential algorithm to a massively parallel thread environment and performance optimization techniques are considered.</ShortDescription>
        <URL>http://mplab.ucsd.edu/wp-content/uploads/CVPR2008/WorkShops/data/papers/143.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1218_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1218_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Florida Atlantic University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>04</ReleaseDay>
        <ReleaseDateDisplay>05/04/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Joel Gibson </Author>
           <Author email="">Oge Marques</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://mplab.ucsd.edu/wp-content/uploads/CVPR2008/WorkShops/data/papers/143.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Joel Gibson ,Oge Marques</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4ea4d671-5de1-4355-8c9d-392fa35be551</GUID>
        <Name>3D Registration Based on Normalized Mutual Information: Performance of CPU vs. GPU Implementation</Name>
        <ShortDescription>Medical image registration is time-consuming but can be sped up employing parallel processing on the GPU.  Normalized mutual information (NMI) is a well performing similarity measure for performing multi-modal registration. We present CUDA based solutions for computing NMI on the GPU and compare the results obtained by rigidly registering multi-modal data sets with a CPU based implementation. Our tests with RIRE data sets show a speed-up of factor 5 to 7 for our best GPU implementation.</ShortDescription>
        <URL>http://www.gris.informatik.tu-darmstadt.de/~swesarg/papers/1632.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1217_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1217_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Technische Universitat Darmstadt</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>04</ReleaseDay>
        <ReleaseDateDisplay>01/04/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Florian Jung</Author>
           <Author email="stefan.wesarg@gris.tu-darmstadt.de">Stefan Wesarg</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.gris.informatik.tu-darmstadt.de/~swesarg/papers/1632.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Florian Jung,Stefan Wesarg,stefan.wesarg@gris.tu-darmstadt.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>93ec4473-01fb-42ff-9e39-46248ff46941</GUID>
        <Name>fastHOG - a real-time GPU implementation of HOG</Name>
        <ShortDescription>We introduce a parallel implementation of the histogram of oriented gradients algorithm for object  detection. Our implementation uses the GPU and the NVIDIA CUDA framework. We achieve speedups of over 67x from the standard sequential code, using a single video card. Furthermore it supports multiple video cards so speedups of 120x or more can be achieved. This allows us to achieve real-time performance, using the
full HOG algorithm for the first time in the literature.</ShortDescription>
        <URL>http://www.robots.ox.ac.uk/ActiveVision/Papers/prisacariu_reid_tr2310_09/prisacariu_reid_tr2310_09.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1216_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1216_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Oxford</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>07/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>120</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="victor@robots.ox.ac.uk">Victor Adrian Prisacariu</Author>
           <Author email="ian@robots.ox.ac.uk">Ian Reid</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.robots.ox.ac.uk/ActiveVision/Papers/prisacariu_reid_tr2310_09/prisacariu_reid_tr2310_09.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Victor Adrian Prisacariu,Ian Reid,victor@robots.ox.ac.uk,ian@robots.ox.ac.uk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>608510ce-a512-41c8-b32a-b55cb524284d</GUID>
        <Name>Detection and Tracking of Human Subjects</Name>
        <ShortDescription>The goal of the thesis project was to devise an algorithm to detect and track people in a static video.  Existing techniques are inadequate; instead a new approach based on background subtraction is used. The approach is successful with a static camera.  In background subtraction, the background of the video is calculated a priori and then subtracted from each frame of the video. This isolates the foreign objects, which are detected via two simple algorithms. Both algorithms are based on the subject's center of mass, but the first algorithm traces the path of the person around the video, making it very cluttered.</ShortDescription>
        <URL>http://www.cs.virginia.edu/~skadron/Papers/Grosvenor_Douglas_thesis.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1215_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1215_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Virginia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>04/30/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Douglas Grosvenor</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cs.virginia.edu/~skadron/Papers/Grosvenor_Douglas_thesis.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Douglas Grosvenor</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>78013ee5-9a75-42af-93a8-4a5b68a330fa</GUID>
        <Name>Optimizing Sparse Matrix-Vector Multplication on GPUs</Name>
        <ShortDescription>We are witnessing the emergence of Graphics Processor units (GPUs) as powerful massively parallel systems.  Furthermore, the introduction of new APIs for general-purpose comptuations on GPUs, namely, CUDA from NVIDIA, Stream SDK form AMD, and OpenCL, makes GPUs an attractive choice for high-performance numerical and scientific computing.</ShortDescription>
        <URL>http://domino.watson.ibm.com/library/CyberDig.nsf/papers/1D32F6D23B99F7898525752200618339/$File/rc24704.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1214_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1214_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>IBM Research Division</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>04/02/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="baskaran@ces.ohio-state.edu">Muthu Manikandan Baskaran</Author>
           <Author email="bordaw@us.ibm.com">Rajesh Bordawekar</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://domino.watson.ibm.com/library/CyberDig.nsf/papers/1D32F6D23B99F7898525752200618339/$File/rc24704.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Muthu Manikandan Baskaran,Rajesh Bordawekar,baskaran@ces.ohio-state.edu,bordaw@us.ibm.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>67ac5cbe-b91c-46b0-8a4e-9635e86dec49</GUID>
        <Name>Acceleration of Binomial Options Pricing via Parallelizing along time-axis on a GPU</Name>
        <ShortDescription>Since the introduction of organized trading of options for commodities and equities, computing fair prices for options has been an important problem in financial engineering. A variety of numerical methods, including Monte Carlo methods, binomial trees, and numerical solution of stochastic differential equations, are used to compute fair prices. Traders and brokerage firms constantly strive to achieve faster calculation of option prices because timely information can mean the difference between a deal struck or missed, which
translates to substantial profit or loss. Hence, the latency to compute a fair option price plays an important role in short-term trading and arbitrage.</ShortDescription>
        <URL>http://saahpc.ncsa.illinois.edu/09/papers/Ganesan_paper.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1213_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1213_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Washington University in St. Louis</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>06/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="nganesan@wustl.edu">Narayan Ganesan</Author>
           <Author email="roger@wustl.edu">Roger D. Chamberlain</Author>
           <Author email="jbuhler@wustl.edu">Jeremy Buhler</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://saahpc.ncsa.illinois.edu/09/papers/Ganesan_paper.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Narayan Ganesan,Roger D. Chamberlain,Jeremy Buhler,nganesan@wustl.edu,roger@wustl.edu,jbuhler@wustl.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>52b3b800-2810-4e18-b631-e995c9a2ed48</GUID>
        <Name>GPU Accelerated Cardiac Electrophysiology</Name>
        <ShortDescription>Numerical simulations of cellular membranes are useful for both basic science and increasingly for clinical diagnostic and therapeutic applications. A common bottleneck in such simulations arises from solving large highly complex stiff systems of ordinary di fferential equations (ODEs) thousands of times for numerous collocation points (representing cells) throughout a three-dimensional volume. For some electrophysiology simulations, over 98% of the time is spent solving these systems of ODEs when run in serial on a single core. </ShortDescription>
        <URL>http://cseweb.ucsd.edu/groups/hpcl/scg/papers/2010/lionetti_ms_thesis.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1212_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1212_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of California, San Diego</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>04/15/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>280</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Fred Lionetti</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://cseweb.ucsd.edu/groups/hpcl/scg/papers/2010/lionetti_ms_thesis.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Fred Lionetti</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7dbbc8bd-6029-4692-911e-5a4e725ef349</GUID>
        <Name>HARNESSING THE POWER OF IDLE GPUS FOR ACCELERATION OF BIOLOGICAL SEQUENCE ALIGNMENT</Name>
        <ShortDescription>This paper presents a parallel system capable of accelerating biological sequence alignment on the graphics processing unit (GPU) grid. The GPU grid in this paper is a desktop grid system that utilizes idle GPUs and CPUs in the office and home. Our parallel implementation employs a master-worker paradigm to accelerate an OpenGLbased algorithm that runs on a single GPU. We integrate this implementation into a screensaver-based grid system that detects idle resources on which the alignment code can run.</ShortDescription>
        <URL>http://www-hagi.ist.osaka-u.ac.jp/research/papers/200912_ino_ppl.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1211_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1211_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Osaka Univeristy</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>08/24/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">FUMIHIKO INO</Author>
           <Author email="">YUKI KOTANI</Author>
           <Author email="">YUMA MUNEKAWA</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www-hagi.ist.osaka-u.ac.jp/research/papers/200912_ino_ppl.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>FUMIHIKO INO,YUKI KOTANI,YUMA MUNEKAWA</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>701ae5ed-4273-4c9a-8546-55b23244a5ca</GUID>
        <Name>Mixing Multi-Core CPUs and GPUs for Scientific Simulation Software</Name>
        <ShortDescription>Recent technological and economic developments have led to widespread availability of multi-core CPUs and specialist accelerator processors such as graphical processing units (GPUs). The accelerated computational performance possible from these devices can be very high for some applications paradigms. Software languages and systems such as NVIDIA's CUDA and Khronos consortium's open compute language (OpenCL) support a number of individual parallel application programming paradigms. To scale up the performance of some complex systems simulations however, a hybrid of multi-core CPUs for coarse-grained parallelism and very many core GPUs for data parallelism is necessary.</ShortDescription>
        <URL>http://www.massey.ac.nz/~dpplayne/Papers/cstn-091.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1210_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1210_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Massey University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>09/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="k.a.hawick@massey.ac.nz">K. A. Hawick</Author>
           <Author email="a.leist@massey.ac.nz">A. Leist</Author>
           <Author email="d.p.playne@massey.ac.nz">D. P. Playne</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.massey.ac.nz/~dpplayne/Papers/cstn-091.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>K. A. Hawick,A. Leist,D. P. Playne,k.a.hawick@massey.ac.nz,a.leist@massey.ac.nz,d.p.playne@massey.ac.nz</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4a0b9cb0-a366-4a75-b2c4-e00c1af60a5c</GUID>
        <Name>Computing on GPUs</Name>
        <ShortDescription>The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs.  A first example has been the porting of computing intensive algorithms like e.g. ray-tracing algorithms form CPU to GPU.  Through the Compute Unified Device Architecture (CUDA [4]) GPUs can also be used to increase computing speed for High Performance Computing applications.  In this paper different parallelization strategies for different processor architectures are presented.  They are compared and firt experiences using GPUs for a collection of numerical applications are given.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1209_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1209_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>DYNAmore GmbH</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>05/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Dr. Uli Gohner</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://zmi.dynamore.de/dynamore/documents/papers/euro2009/N-II-03.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Dr. Uli Gohner</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3e365933-a682-4caa-aa9c-b165f30d448d</GUID>
        <Name>High-Performance Physics Simulations Using Multi-Core CPUs and GPGPUs in a Volunteer Computing Context</Name>
        <ShortDescription>This paper presents two conceptually simple methods for parallelizing a Parallel Tempering Monte Carlo simulation in a distributed volunteer computing context, where computers belonging to the general public are used. The first method uses conventional multi-threading. The second method uses CUDA, a graphics card computing system.  Parallel Tempering is described, and challenges such as parallel random number generation and mapping of Monte Carlo chains to different threads are explained. While conventional multi-threading on CPUs is well-established, GPGPU programming techniques and technologies are still developing and present several challenges, such as the effective use of a relatively large number of threads.</ShortDescription>
        <URL>http://arxiv.org/ftp/arxiv/papers/1004/1004.0023.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1208_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1208_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>D-Wave Systems Inc.</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>03/31/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="kkarimi,@dwavesys.com">Kamran Karimi </Author>
           <Author email="ndickson@dwavesys.com">Neil G. Dickson </Author>
           <Author email="fhamze@dwavesys.com">Firas Hamze</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://arxiv.org/ftp/arxiv/papers/1004/1004.0023.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Kamran Karimi ,Neil G. Dickson ,Firas Hamze,kkarimi,@dwavesys.com,ndickson@dwavesys.com,fhamze@dwavesys.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1e8f6f3a-a7bd-4267-9fc9-3680f9bc0449</GUID>
        <Name>Accelerating Large-scale Convolutional Neural Networks with Parallel Graphics Multiprocessors</Name>
        <ShortDescription>Training convolutional neural networks (CNNs) on large sets of high-resolution images is too computationally intense to be performed on commodity CPUs. Such architectures however achieve state-of-the-art results on low-resolution machine vision tasks such as the recognition of handwritten characters. We have adapted the inherent multi-level parallelism of CNNs for Nvidia's CUDA GPU architecture to accelerate the training by two orders of magnitude.</ShortDescription>
        <URL>http://www.ais.uni-bonn.de/papers/nips09ws_scherer_behnke.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1206_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1206_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>University of Bonn, Germany</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>12/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="scherer@ais.ni-bonn.de">Dominik Scherer</Author>
           <Author email="behnke@cs.uni-bonn.de">Sven Behnke</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ais.uni-bonn.de/papers/nips09ws_scherer_behnke.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Dominik Scherer,Sven Behnke,scherer@ais.ni-bonn.de,behnke@cs.uni-bonn.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c0bb0403-84f7-42b9-8575-ec19bf6268d2</GUID>
        <Name>A Practical GPU Based KNN Algorithm</Name>
        <ShortDescription>The KNN algorithm is a widely applied method for classification in machine learning and pattern recognition. However, we can't be able to get a satisfactory performance in many applications, as the KNN algorithm has a high computational complexity. Recent developments in programmable, highly paralleled Graphics Processing Units (GPU) have opened a new era of parallel computing which deliver tremendous computational horsepower in a single chip. In this paper, we describe a practical GPU based K Nearest Neighbor (KNN) algorithm implemented by CUDA.</ShortDescription>
        <URL>http://www.academypublisher.com/proc/iscsct09/papers/iscsct09p151.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1205_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1205_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Soochow University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>12/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="kqs.net@163.com">Quansheng Kuang</Author>
           <Author email="zhaol@suda.edu.cn">Lei Zhao</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.academypublisher.com/proc/iscsct09/papers/iscsct09p151.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Quansheng Kuang,Lei Zhao,kqs.net@163.com,zhaol@suda.edu.cn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f51cd9bf-7067-4281-9b34-95671175e688</GUID>
        <Name>http://www.modelica.org/events/modelica2009/Proceedings/memorystick/pages/papers/0032/0032.pdf</Name>
        <ShortDescription>This work focuses on the use of parallel hardware to improve the simulation speed of equation-based object-oriented Modelica models. With this intention, a method has been developed that allows for the translation of a restricted class of Modelica models to parallel simulation code, targeted for the Nvidia Tesla architecture and based on the Quantized State Systems (QSS) simulation algorithm.</ShortDescription>
        <URL>http://www.modelica.org/events/modelica2009/Proceedings/memorystick/pages/papers/0032/0032.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1204_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1204_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName></OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>09/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Martina Maggio</Author>
           <Author email="">Kristian Stavaker</Author>
           <Author email="">Filippo Donida</Author>
           <Author email="">Francesco Casella</Author>
           <Author email="">Peter Fritzson</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.modelica.org/events/modelica2009/Proceedings/memorystick/pages/papers/0032/0032.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Martina Maggio,Kristian Stavaker,Filippo Donida</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0648ea65-f71f-4795-a1be-579a9aa03e90</GUID>
        <Name>An efficient GPU implementation for large scale individual-based simulation of collective behavior</Name>
        <ShortDescription>In this work we describe a GPU implementation for an individual-based model for fish schooling. In this model
each fish aligns its position and orientation with an appropriate average of its neighbors positions and orientations. This carries a very high computational cost in the so-called nearest neighbors search. By leveraging the GPU processing power and the new programming model called CUDA we implement an efficient framework which permits to simulate the collective motion of high-density individual groups.</ShortDescription>
        <URL>http://www.unibas.it/utenti/erra/Papers/HiBi09.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1203_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1203_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universita della Basilicata</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>10/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ugo.erra@unibas.it">Ugo Erra</Author>
           <Author email="ber.frola@gmail.com">Bernardino Frola</Author>
           <Author email="vitsca@dia.unisa.it">Vittorio Scarano</Author>
           <Author email="icouzin@princeton.edu">Iain Couzin</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.unibas.it/utenti/erra/Papers/HiBi09.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ugo Erra,Bernardino Frola,Vittorio Scarano,ugo.erra@unibas.it,ber.frola@gmail.com,vitsca@dia.unisa.it</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9dedab39-73d9-4c13-b5d2-2104819cec81</GUID>
        <Name>A Hybrid Analytical DRAM Performance Model</Name>
        <ShortDescription>As process technology scales, the number of transistors that can in a unit area has increased exponentially. Processor throughput, memory storage, and memory throughput have all been increasing at an exponential pace. As such, DRAM has become an ever-tightening bottleneck for applications with irregular memory access patterns. Computer architects in industry sometimes use ad hoc analytical modeling techniques in lieu of cycle-accurate performance simulation to identify critical design points.</ShortDescription>
        <URL>https://www.ece.ubc.ca/~aamodt/papers/gyuan.mobs2009.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1202_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1202_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of British Columbia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>05/19/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="gyuan@ece.ubc.ca">George L. Yuan</Author>
           <Author email="aamodt@ece.ubc.ca">Tor M. Aamodt</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="https://www.ece.ubc.ca/~aamodt/papers/gyuan.mobs2009.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>George L. Yuan,Tor M. Aamodt,gyuan@ece.ubc.ca,aamodt@ece.ubc.ca</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>df66c76c-4fda-468e-9eca-124cae57e3c4</GUID>
        <Name>Parallelisation of Fuzzy Inference on a Graphics Processor Unit Using the Compute Unified Device Architecture</Name>
        <ShortDescription>The inherently parallel nature of fuzzy inference is rarely exploited by fuzzy systems researchers. Hardware implementations, such as Field Programmable Gate Arrays (FPGAs), commonly use parallel architectures to
achieve fast inference speeds. In this paper, we explore the use of Graphics Processor Units (GPUs) and NVIDIA&#8223;s Compute Unified Device Architecture (CUDA) for fast inference speeds in a scalable and flexible 
Mamdani type fuzzy inference system (FIS). Our goal is to provide computational intelligence researchers the skills necessary to exploit the low cost and high performance of GPUs with a minimum learning cost.</ShortDescription>
        <URL>http://www.cci.dmu.ac.uk/ukci2008/papers/Parallelisation-of-Fuzzy-Inference-on-a-Graphics-Processor-Unit-Using-the-Compute-Unified-Device-Architecture.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1201_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1201_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Missouri, Columbia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>05/01/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Derek Anderson</Author>
           <Author email="">Simon Coupland</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cci.dmu.ac.uk/ukci2008/papers/Parallelisation-of-Fuzzy-Inference-on-a-Graphics-Processor-Unit-Using-the-Compute-Unified-Device-Architecture.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Derek Anderson,Simon Coupland</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>76dcd879-098a-41f2-ab03-38c43d2a042e</GUID>
        <Name>GPU-Based Road Sign Detection Using Particle Swarm Optimization</Name>
        <ShortDescription>Road Sign Detection is a major goal of Advanced Driving Assistance Systems (ADAS). Since the dawn of this
discipline, much work based on different techniques has been published which shows that traffic signs can be first detected and then classified in video sequences in real time. While detection is usually performed using classical computer vision techniques based on color and/or shape matching, most often classification is performed by neural networks. In this work we present a novel approach based on both sign shape and color which uses Particle Swarm Optimization (PSO) for detection.</ShortDescription>
        <URL>http://www.ce.unipr.it/~mussi/downloads/papers/mussiISDA09.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1200_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1200_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Parma</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>11/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="mussi@ce.unipr.it">Luca Mussi</Author>
           <Author email="cagnoni@ce.unipr.it">Stefano Cagnoni</Author>
           <Author email="fabio.daolio@unil.ch">Fabio Daolio</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ce.unipr.it/~mussi/downloads/papers/mussiISDA09.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Luca Mussi,Stefano Cagnoni,Fabio Daolio,mussi@ce.unipr.it,cagnoni@ce.unipr.it,fabio.daolio@unil.ch</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f0e5e186-d65b-4d6b-9f18-fb153cfcf39a</GUID>
        <Name>LARGE-SCALE PARALLEL MULTIBODY DYNAMICS WITH FRICTIONAL CONTACT</Name>
        <ShortDescription>In the context of simulating the frictional contact dynamics of large systems of rigid bodies, this paper reviews a novel method for solving large cone complementarity problems by means of a fixed-point iteration algorithm. The method is an extension of the Gauss-Seidel and Gauss-Jacobi methods with overrelaxation for symmetric convex linear complementarity problems.</ShortDescription>
        <URL>http://www.mcs.anl.gov/uploads/cels/papers/P1487.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1199_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1199_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Wisconsin Madison</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>10/20/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="negrut@wisc.edu">Dan Negrut</Author>
           <Author email="tasora@ied.unipr.it">Alessandro Tasora</Author>
           <Author email="anitescu@mcs.anl.gov">Mihai Anitescu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.mcs.anl.gov/uploads/cels/papers/P1487.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Dan Negrut,Alessandro Tasora,Mihai Anitescu,negrut@wisc.edu,tasora@ied.unipr.it,anitescu@mcs.anl.gov</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>abb411b1-54e9-4a4d-8f26-7acad6754856</GUID>
        <Name>A characterization and analysis of PTX kernels</Name>
        <ShortDescription>General purpose application development for GPUs (GPGPU) has recently gained momentum as a cost-effective approach for accelerating data- and compute-intensive applications. It has been driven by the introduction of C-based programming environments such as NVIDIA's CUDA [1], OpenCL [2], and Intel's Ct [3]. While significant effort has been focused on developing and evaluating applications and software tools, comparatively little has been devoted to the analysis and characterization of applications to assist future work in compiler optimizations, application re-structuring, and micro-architecture design.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/IISWC.2009.5306801</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1198_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1198_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Georgia Institute of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>05/05/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Andrew Kerr</Author>
           <Author email="">Gregory Diamos</Author>
           <Author email="">Sudhakar Yalamanchili</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cercs.gatech.edu/tech-reports/tr2009/git-cercs-09-06.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Andrew Kerr,Gregory Diamos,Sudhakar Yalamanchili</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ed43c757-50fe-4151-a1c4-21184ce71dbd</GUID>
        <Name>General Purpose Computation on Graphics Processing Units (GPGPU) using CUDA</Name>
        <ShortDescription>Graphics processing units (GPUs) are special processors which traditionally were used to accelerate computer graphics by offloading work from the CPU. Today, GPUs are highly parallel many-core processors which enable general-purpose computation on graphics processing units (GPGPU). GPGPU has already been an issue since 2002 but a huge interest did not evolve until Nvidia released the CUDA platform in 2007. Developers and researchers started to use CUDA for parallel programming.</ShortDescription>
        <URL>http://www.wi.uni-muenster.de/pi/lehre/ws0910/pppa/papers/gpgpu.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1195_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1195_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Westfalische Wilhelms-Universitat</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>12/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Alexander Zibula</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.wi.uni-muenster.de/pi/lehre/ws0910/pppa/papers/gpgpu.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alexander Zibula</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>993dd63c-de2a-49e1-81ba-ade7f1682b25</GUID>
        <Name>Simulation of one-layer shallow water systems on multicore and CUDA architectures</Name>
        <ShortDescription>The numerical solution of shallow water systems is useful for several applications related to geophysical flows but the big dimensions of the domains suggests the use of powerful accelerators to obtain numerical results in reasonable times. This paper addresses how to speed up the numerical solution of a first order well-balanced finite volume scheme for 2D one-layer shallow water systems by using modern Graphics Processing Units (GPUs) supporting the NVIDIA CUDA programming model.</ShortDescription>
        <URL>http://lsi.ugr.es/~jmantas/papers/supercomputing09.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1194_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1194_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>1 Universidad de Granada 2Universidad de Malaga</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>03/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="marc@correo.ugr.es">Marc de la Asuncion1</Author>
           <Author email="jmmantas@ugr.es">Jose M. Mantas1</Author>
           <Author email="castro@anamat.cie.uma.es">Manuel J. Castro2</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://lsi.ugr.es/~jmantas/papers/supercomputing09.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Marc de la Asuncion1,Jose M. Mantas1,Manuel J. Castro2</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>78e3f6d6-b219-43a8-8c2d-f515465c3670</GUID>
        <Name>IDN_MFC</Name>
        <ShortDescription>Image denoising with bilateral filter algorithms</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1193_64638_Application_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1193_64638_Application_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Wlroclaw University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>06/22/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>100</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="wojciech.korycki@gmail.com">Wojciech Korycki</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.filefactory.com/file/b2301b6/n/BilateralFilter.rar">Application</ContentType>
           <ContentType url="http://www.nvidia.com/content/cudazone/CUDABrowser/downloads/papers/bilateral_filtering_en.pps">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bilateral Filter denoising,Wojciech Korycki,wojciech.korycki@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9a370a84-4d82-4133-b523-6f56cca33568</GUID>
        <Name>Hypercubic Storage Layout and</Name>
        <ShortDescription>Many simulations in the physical sciences are expressed in terms of rectilinear arrays of variables. It is
attractive to develop such simulations for use in 1-, 2-, 3- or arbitrary physical dimensions and also in a
manner that supports exploitation of data-parallelism on fast modern processing devices.We report on data
layouts and transformation algorithms that support both conventional and data-parallel memory layouts.</ShortDescription>
        <URL>http://tur-www1.massey.ac.nz/~dpplayne/Papers/cstn-096.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1192_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1192_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Massey University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>23</ReleaseDay>
        <ReleaseDateDisplay>06/23/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">K. A. Hawick</Author>
           <Author email="">D. P. Playne</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://tur-www1.massey.ac.nz/~dpplayne/Papers/cstn-096.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>K. A. Hawick,D. P. Playne</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f33d13b3-5699-4fb2-b908-98d32866aa20</GUID>
        <Name>Analyzing CUDA Workloads Using a Detailed GPU Simulator</Name>
        <ShortDescription>Modern Graphic Processing Units (GPUs) provide sufficiently flexible programming models that understanding their performance can provide insight in designing tomorrow's manycore processors, whether those are GPUs or otherwise. The combination of multiple, multithreaded, SIMD cores makes studying these GPUs useful in understanding tradeoffs among memory, data, and thread level parallelism. While modern GPUs offer orders of magnitude more raw computing power than contemporary CPUs, many important applications, even those with abundant data level parallelism, do not achieve peak performance.</ShortDescription>
        <URL>https://www.ece.ubc.ca/~aamodt/papers/gpgpusim.ispass09.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1191_GPUComputing bgimg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1191_GPUComputing bgimg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of British Columbia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>03/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="bakhoda@ece.ubc.ca">Ali Bakhoda</Author>
           <Author email="gyuan@ece.ubc.ca">George L. Yuan</Author>
           <Author email="wwlfung@ece.ubc.ca">Wilson W. L. Fung</Author>
           <Author email="henryw@ece.ubc.ca">Henry Wong</Author>
           <Author email="aamodt@ece.ubc.ca">Tor M. Aamodt</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="https://www.ece.ubc.ca/~aamodt/papers/gpgpusim.ispass09.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ali Bakhoda,George L. Yuan,Wilson W. L. Fung,Henry Wong,Tor M. Aamodt</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c1d20c0c-79f9-44f0-9331-291ccbeb0ee7</GUID>
        <Name>Phase Based Volume Registration Using CUDA</Name>
        <ShortDescription>We have implemented phase based volume registration using CUDA, in contrast to all other GPU based image registration implementations that are based on the image intensity. Our registration algorithm is more robust for volumes that differ significantly in intensity. This work was presented at the IEEE conference ICASSP in Dallas 2010.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1189_449881_phase_based_volume_registration_using_CUDA_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1189_449881_phase_based_volume_registration_using_CUDA_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Linkoping university</OrganizationName>
        <OrganizationURL>http://www.moviii.isy.liu.se</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>06/22/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>30</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="andek@imt.liu.se">Anders Eklund</Author>
           <Author email="matsa@imt.liu.se">Mats Andersson</Author>
           <Author email="knutte@imt.liu.se">Hans Knutsson</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.moviii.liu.se/files/icasspposter.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Image registration, local phase,Anders Eklund,Mats Andersson,Hans Knutsson,andek@imt.liu.se,matsa@imt.liu.se,knutte@imt.liu.se</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f9f4cdda-fab7-40fa-b8bc-43482f378a81</GUID>
        <Name>Towards a Software Transactional Memory for Graphics Processors</Name>
        <ShortDescription>The introduction of general purpose computing on many-core graphics processor systems, and the general shift in the industry towards parallelism, has created a demand for ease of parallelization. Software transactional memory (STM) simplifies development of concurrent code by allowing the programmer to mark sections of code to be executed concurrently and atomically in an optimistic manner. In contrast to locks, STMs are easy to compose and do not suffer from deadlocks. We have designed and implemented two STMs for graphics processors, one blocking and one non-blocking. The design issues involved in the development of these two STMs are described and explained in the paper together with experimental results comparing the performance of the two STMs. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1188_7612_cudazonestm_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1188_7612_cudazonestm_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Chalmers University of Technology</OrganizationName>
        <OrganizationURL>http://www.chalmers.se</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>04/14/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="cederman@chalmers.se">Daniel Cederman</Author>
           <Author email="tsigas@chalmers.se">Philippas Tsigas</Author>
           <Author email="tayyabch@ciitlahore.edu.pk">Muhammad Tayyab Chaudhry</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cse.chalmers.se/research/group/dcs/gpustm.html">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Programming Tools</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Daniel Cederman,Philippas Tsigas,cederman@chalmers.se,tsigas@chalmers.se</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a8c7fb3f-0fb8-40d2-8440-c9dedecf7051</GUID>
        <Name>nexiwave Speech Indexing</Name>
        <ShortDescription>nexiwave 2.0 the GPU Assisted Speech Indexing</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1187_13933_nexilogo_betawith_snowflakes_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1187_13933_nexilogo_betawith_snowflakes_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>nexiwave.com</OrganizationName>
        <OrganizationURL>http://nexiwave.com</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>06/03/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>75</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="ben@nexiwave.com">Ben Jiang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://nexiwave.com">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Signal Processing</ApplicationType>
           <ApplicationType>Speech Indexing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Speech Indexing, GPU,Ben Jiang,ben@nexiwave.com</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>a42593ae-afe0-46ed-85d3-7a1ab25c93ac</GUID>
        <Name>Massive Bayesian Mixture Modelling</Name>
        <ShortDescription>This paper describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large data sets. An example context concerns common biological studies using high-throughput technologies generating many, very large data sets and requiring increasingly high-dimensional mixture models with large numbers of mixture components. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1186_179430_cfse_clusters_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1186_179430_cfse_clusters_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>UCLA and Duke University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>03/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>160</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="msuchard@ucla.edu">Marc A. Suchard</Author>
           <Author email="mw@stat.duke.edu">Quanli Wang</Author>
           <Author email="">Cliburn Chan</Author>
           <Author email="">Jacob Frelinger</Author>
           <Author email="">Andrew Cron</Author>
           <Author email="">Mike West</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.stat.duke.edu/research/software/west/gpu/software.html">Paper</ContentType>
           <ContentType url="http://www.stat.duke.edu/research/software/west/gpu/software.html">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Life Sciences</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Computational statistics,Marc A. Suchard,Quanli Wang,Cliburn Chan, Jacob Frelinger, Andrew Cron, Mike West,msuchard@ucla.edu,mw@stat.duke.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b5e57af8-695e-4550-9a42-a30be2716079</GUID>
        <Name>Accelerating Quadrature Methods for Option Valuation</Name>
        <ShortDescription>This paper presents an architecture for FPGA acceleration of quadrature methods used for pricing complex options, such as discrete barrier, Bermudan, and American options. The architecture can be optimized for speed and power consumption by exploiting pipelining and parallelism to produce efficient implementations in reconfigurable logic. An optimised implementation using Graphics Processing Units (GPUs) is also developed, to provide a performance and efficiency comparison with an FPGA accelerator.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/FCCM.2009.36</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1185_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1185_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Imperial College London</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>09</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>04/01/09</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Anson H. T. Tse</Author>
           <Author email="">David B. Thomas</Author>
           <Author email="">Wayne Luk</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/FCCM.2009.36">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Anson H. T. Tse,David B. Thomas,Wayne Luk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>934fb2f1-7ee6-4958-bb7d-b4ed38debaee</GUID>
        <Name>High Resolution Program Flow Visualization of Hardware Accelerated Hybrid Multi-core Applications</Name>
        <ShortDescription>The advent of multi-core processors has made parallel computing techniques mandatory on main stream systems. With the recent rise of hardware accelerators, hybrid parallelism adds yet another dimension of complexity to the process of software development. This article presents a tool for graphical program flow analysis of hardware accelerated parallel programs.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/CCGRID.2010.27</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1184_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1184_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName></OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>05/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Daniel Hackenberg</Author>
           <Author email="">Guido Juckeland</Author>
           <Author email="">Holger Brunst</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CCGRID.2010.27">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Daniel Hackenberg,Guido Juckeland,Holger Brunst</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d0ffd6f3-3337-4dbe-a796-0c7d19b1cd6e</GUID>
        <Name>An Analysis of GPU Parallel Computing</Name>
        <ShortDescription>Parallel systems are becoming ubiquitous in the world of computing as evidenced by multi-core processors, heterogeneous Cell broadband engine, and highly parallel graphics processing units (GPUs). All parallel systems share a requirement that parallel programming is necessary to leverage multiple cores. As a result of this trend, multi-core CPUs are no longer a clear winner due to its peaked clock frequency and programming effort involved in parallelizing code for multi-core architecture. Given such drawbacks, dataparallel applications might benefit from GPU assisted computing.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/HPCMP-UGC.2009.59</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1183_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1183_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>U. S. Army Research Laboratory</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>06/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Song Jun Park</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/HPCMP-UGC.2009.59">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Song Jun Park</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>79cc1232-9c03-4eca-ac3f-dd9b3743fac0</GUID>
        <Name>Tiling for Performance Tuning on Different Models of GPUs</Name>
        <ShortDescription>The strategy of using CUDA-compatible GPUs as a parallel computation solution to improve the performance of programs has been more and more widely approved during the last two years since the CUDA platform was released. Its benefit extends from the graphic domain to many other computationally intensive domains. Tiling, as the most general and important technique, is widely used for optimization in CUDA programs. New models of GPUs with better compute capabilities have, however, been released, new versions of CUDA SDKs were also released.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ISISE.2009.60</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1182_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1182_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Hong Kong University of Science and Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2002</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>11/01/2002</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Chang Xu</Author>
           <Author email="">Steven R. Kirk</Author>
           <Author email="">Samantha Jenkins</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ISISE.2009.60">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Chang Xu,Steven R. Kirk,Samantha Jenkins</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e5e749ac-945e-42f2-a238-1209f8986eb2</GUID>
        <Name>Acceleration of Medical Image Registration Using Graphics Process Units in Computing Normalized Mutual Information</Name>
        <ShortDescription>This paper presents a computational performance analysis of an accelerated medical image registration using Graphics Processing Units (GPUs). In our previous work, a multi-resolution approach using normalized mutual information (NMI) has proven to be useful in medical image registration. In this paper, we propose an acceleration of the NMI procedure using GPU implementation because of the parallel processing capabilities. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ICIG.2009.48</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1181_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1181_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Kent State University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>09/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Wei-Hung Cheng</Author>
           <Author email="">Cheng-Chang Lu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICIG.2009.48">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Wei-Hung Cheng,Cheng-Chang Lu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>705aaab7-bfe3-4433-ae2d-e1490bf77dbb</GUID>
        <Name>MAX-MIN Ant System on GPU with CUDA</Name>
        <ShortDescription>We propose a parallel MAX-MIN Ant System (MMAS) algorithm that is suitable for an implementation on graphics processing units (GPUs). Multi ant colonies with respective parameter settings are whole offloaded to the GPU in parallel. We have implemented this GPU-based MMAS on the GPU with compute unified device architecture (CUDA). Some performance optimization means for kernel program of GPU are introduced. Experimental results that are based on simulations for the traveling salesperson problem are presented to evaluate the proposed techniques.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1180_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1180_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName></OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>12/07/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Hongtao Bai</Author>
           <Author email="">Dantong Ou Yang</Author>
           <Author email="">Ximing Li</Author>
           <Author email="">Lili He</Author>
           <Author email="">Haihong Yu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICICIC.2009.255">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Hongtao Bai,Dantong Ou Yang,Ximing Li</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f3d7658d-4572-4345-a1bd-3fe05ca6ce37</GUID>
        <Name>Scene Recognition Acceleration Using CUDA and OpenMP</Name>
        <ShortDescription>Scene recognition has become a remarkable field in image processing area, and many methods have been proposed in recent years, in which the idea of extracting the scene gist from global features has been proved to have higher retrieval accuracy compared with many other methods. However, the process of extracting gist is heavily time-consuming and not suitable for real-time application. In this paper, the CUDA architecture is deployed to accelerate this process.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ICISE.2009.1045</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1179_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1179_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Dalian University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>12/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yuxin Wang</Author>
           <Author email="">Zhen Feng</Author>
           <Author email="">He Guo</Author>
           <Author email="">Changqin He</Author>
           <Author email="">Yuansheng Yang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICISE.2009.1045">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yuxin Wang,Zhen Feng,He Guo</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b717867c-c959-4577-b6a5-afbc0e42fdae</GUID>
        <Name>A Stream Processor Cluster Architecture Model with the Hybrid Technology of MPI and CUDA</Name>
        <ShortDescription>Nowadays, the compute capability of traditional cluster system can't keep up with the computing needs of a practical application, and these aspects of energy, space technology, etc. have become a huge problem. However, as parallel computing equipment, the stream processor (SP) has a high performance of floating-point operations. NVIDIA GPUs is a typical stream processor device, CUDA technology enables the way to develop a better parallel program on GPUs to become flexible. In this paper, we make use of the hybrid parallel computing programming environment (HPCPE) with MPI and CUDA technology to build the simple CPU + GPU-based stream processor cluster system.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ICISE.2009.171</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1177_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1177_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Shanghai for Science and Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>12/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Qing-kui Chen</Author>
           <Author email="">Jia-kang Zhang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICISE.2009.171">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Qing-kui Chen,Jia-kang Zhang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5da86d39-2d35-4792-a0b5-e400e3383959</GUID>
        <Name>Formal Description and Optimization Based High - Performance Computing on CUDA</Name>
        <ShortDescription>In recent years, with the development of GPU, based on the general purpose computation on graphics processors has became a new field. Aiming at the processing of GPU, this paper provides the formal description for data parallel mode, a detailed description of the CUDA programming mode land the principle of optimization. It shows by the comparative experiment that CUDA owns strongly of the ability to the parallel processing and provides new methods and ideas to GPGPU.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1176_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1176_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Hong Kong University of Science and Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>12/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Bo Li</Author>
           <Author email="">Huacheng Zhao</Author>
           <Author email="">JingJing Liang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICISE.2009.602">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bo Li,Huacheng Zhao,JingJing Liang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>2287f161-8378-4190-ae1f-bd428d9ca3c3</GUID>
        <Name>Password Recovery for RAR Files Using CUDA</Name>
        <ShortDescription>Driven by the insatiable demand of real-time graphics, especially from the market of computer games, Graphics Processing Unit (GPU) is becoming a major computing horsepower during recent years since the performance of GPU is surpassing that of the contemporary CPU. This paper presents our study on how to efficiently recover the passwords for encrypted RAR files. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/DASC.2009.123</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1175_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1175_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName></OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>12/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Guang Hu</Author>
           <Author email="">Jianhua Ma</Author>
           <Author email="">Benxiong Huang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/DASC.2009.123">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Guang Hu,Jianhua Ma,Benxiong Huang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>2b933813-11b7-4f81-8e63-9ce67eba045f</GUID>
        <Name>Password Recovery for RAR Files Using CUDA</Name>
        <ShortDescription>Driven by the insatiable demand of real-time graphics, especially from the market of computer games, Graphics Processing Unit (GPU) is becoming a major computing horsepower during recent years since the performance of GPU is surpassing that of the contemporary CPU. This paper presents our study on how to efficiently recover the passwords for encrypted RAR files. Our research focus is on the AES key generation processing, which is the most time consuming stage in the whole RAR encryption/decryption process. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/DASC.2009.123</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1174_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1174_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName></OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>12/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Guang Hu</Author>
           <Author email="">Jianhua Ma</Author>
           <Author email="">Benxiong Huang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/DASC.2009.123">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Guang Hu,Jianhua Ma,Benxiong Huang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f928f65a-86c9-4a31-8299-3e40f02d03fa</GUID>
        <Name>GPU-Assisted Computation of Centroidal Voronoi Tessellation</Name>
        <ShortDescription>Centroidal Voronoi tessellations (CVT) are widely used in computational science and engineering. The most commonly used method is Lloyd's method, and recently the L-BFGS method is shown to be faster than Lloyd's method for computing the CVT. However, these methods run on the CPU and are still too slow for many practical applications. We present techniques to implement these methods on the GPU for computing the CVT on 2D planes and on surfaces, and demonstrate significant speedup of these GPU-based methods over their CPU counterparts.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/TVCG.2010.53</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1173_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1173_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Texas at Dallas</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>03/16/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Guodong Rong</Author>
           <Author email="">Yang Liu</Author>
           <Author email="">Wenping Wang</Author>
           <Author email="">Xiaotian Yin</Author>
           <Author email="">David Gu</Author>
           <Author email="">Xiaohu Guo</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/TVCG.2010.53">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Guodong Rong,Yang Liu,Wenping Wang,Xiaotian Yin,David Gu,Xiaohu Guo</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7e3424c0-b0ed-4476-8830-eb60da8a80c7</GUID>
        <Name>Designing Efficient Many-Core Parallel Algorithms for All-Pairs Shortest-Paths Using CUDA</Name>
        <ShortDescription>Finding the all-pairs shortest-paths on a large graph is a fundamental problem in many practical applications such as bioinformatics, internet node traffic and network routing. In this paper, we present the designs of two efficient parallel algorithms for many-core GPUs using CUDA. Our algorithms expose substantial fine-grained parallelism while maintaining minimal global communication. By using the global scope of the GPU's global memory, coalescing the global memory reads and writes, and avoiding on-chip shared memory bank conflicts, we are able to achieve a large performance benefit with a speed-up of 2,500x on a desktop computer in comparison with a single core program.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ITNG.2010.230</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1172_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1172_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Lamar University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>04/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Quoc-Nam Tran</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ITNG.2010.230">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Quoc-Nam Tran</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0315bfde-f758-4fc6-8cf5-85aac810ca12</GUID>
        <Name>Record Setting Software Implementation of DES Using CUDA</Name>
        <ShortDescription>The increase in computational power of off-the-shelf hardware offers more and more advantageous tradeoffs among efficiency, cost and availability, thus enhancing the feasibility of of cryptanalytic attacks aiming to lower the security of widely used cryptosystems. In this paper we illustrate an GPU-based software implementation of the most efficent variant of Data Encryption Standard (DES), showing the performance of a software breaker which effectively exploits the multi-core Nvidia GT200 graphic architecture. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ITNG.2010.43</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1171_logo_CS Digital Library_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1171_logo_CS Digital Library_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName></OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>04/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Giovanni Agosta</Author>
           <Author email="">Allessandro Barenghi</Author>
           <Author email="">Fabrizio De Santis</Author>
           <Author email="">Gerardo Pelosi</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ITNG.2010.43">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Giovanni Agosta,Allessandro Barenghi,Fabrizio De Santis</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7df8d14f-4c52-460a-8881-ad932fd45292</GUID>
        <Name>Eye-Full Tower: A GPU-based variable multibaseline omnidirectional stereovision system with automatic baseline selection for outdoor mobile robot navigation</Name>
        <ShortDescription>In recent years, it can be observed that there is a gradual increase in the number of researchers and projects involved with the development of omnidirectional vision systems for various applications. The primary factors, which contributed towards this positive trend, are the availability of inexpensive and high resolution vision sensors, robust and fast computers and the advantages of using such systems over perspective vision systems. In this paper, a novel variable multibaseline omnidirectional stereovision system is presented.</ShortDescription>
        <URL>http://portal.acm.org/citation.cfm?id=1805342.1805504&amp;coll=Portal&amp;dl=GUIDE&amp;CFID=92176503&amp;CFTOKEN=39358289</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1170_logo_acm_portal2_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1170_logo_acm_portal2_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Monash University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>06/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Wen Lik Dennis Lui</Author>
           <Author email="">Ray Jarvis</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://portal.acm.org/citation.cfm?id=1805342.1805504&amp;coll=Portal&amp;dl=GUIDE&amp;CFID=92176503&amp;CFTOKEN=39358289">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Wen Lik Dennis Lui,Ray Jarvis</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>56e6ce9f-ae50-427d-8d3f-23b1e24c6683</GUID>
        <Name>Optimized high speed pixel sorting and its application in watershed based image segmentation</Name>
        <ShortDescription>Efficient sorting of image pixels based on their grayscale value is traditionally implemented using an algorithm based on distribution or counting sort methods. We show that an elegant alternative can be used which outperforms the traditional method both in terms of processing speed and main memory access. We discuss both theoretically analyzed and real-life performance results, and demonstrate the improvements that can be obtained when our algorithm is combined with a well-known watershed image segmentation method.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1169_logo_acm_portal2_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1169_logo_acm_portal2_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>The National Institute for Criminalistics and Criminology (NICC) </OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>07/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Patrick De Smet</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://portal.acm.org/citation.cfm?id=1752253.1752389&amp;coll=Portal&amp;dl=GUIDE&amp;CFID=92176503&amp;CFTOKEN=39358289">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Patrick De Smet</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>729f484e-7759-401b-a6ef-78c85f290bd6</GUID>
        <Name>GPU-accelerated molecular dynamics simulation for study of liquid crystalline flows</Name>
        <ShortDescription>We have developed a GPU-based molecular dynamics simulation for the study of flows of fluids with anisotropic molecules such as liquid crystals. An application of the simulation to the study of macroscopic flow (backflow) generation by molecular reorientation in a nematic liquid crystal under the application of an electric field is presented. The computations of intermolecular force and torque are parallelized on the GPU using the cell-list method, and an efficient algorithm to update the cell lists was proposed. Some important issues in the implementation of computations that involve a large number of arithmetic operations and data on the GPU that has limited high-speed memory resources are addressed extensively.</ShortDescription>
        <URL>http://portal.acm.org/citation.cfm?id=1808372.1808870&amp;coll=Portal&amp;dl=GUIDE&amp;CFID=92176503&amp;CFTOKEN=39358289</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1168_logo_acm_portal2_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1168_logo_acm_portal2_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Kochi University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>08/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Alfeus Sunarso</Author>
           <Author email="">Tomohiro Tsuji</Author>
           <Author email="">Shigeomi Chono</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://portal.acm.org/citation.cfm?id=1808372.1808870&amp;coll=Portal&amp;dl=GUIDE&amp;CFID=92176503&amp;CFTOKEN=39358289">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alfeus Sunarso,Tomohiro Tsuji,Shigeomi Chono</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>2b5f83d2-9afa-42a5-8582-8a9b3ae48841</GUID>
        <Name>Parallel implementation of wavelet-based image denoising on programmable PC-grade graphics hardware</Name>
        <ShortDescription>The discrete wavelet transform (DWT) has been extensively used for image compression and denoising in the areas of image processing and computer vision. However, the intensive computation of DWT due to its inherent multilevel data decomposition and reconstruction operations brings a bottleneck that drastically reduces its performance and implementations for real-time applications when facing large size digital images and/or high-definition videos. Although various software-based acceleration solutions, such as the lifting scheme, have been devised and achieved a higher performance in general, the pure software accelerated DWT still struggle to cope with the demands from real-time and interactive applications. With the growing capacity and popularity of graphics hardware, personal computers (PCs) nowadays are often equipped with programmable graphics processing units (GPUs) for graphics acceleration. The GPU offers a cost-effective parallel data processing mechanism for operations on large amount of data, even for applications beyond graphics. This practice is commonly referred as general-purpose computing on GPU (GPGPU). </ShortDescription>
        <URL>http://portal.acm.org/citation.cfm?id=1786816.1787181&amp;coll=Portal&amp;dl=GUIDE&amp;CFID=92176503&amp;CFTOKEN=39358289</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1167_logo_acm_portal2_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1167_logo_acm_portal2_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Huddersfield</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>08/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yang Su</Author>
           <Author email="">Zhijie Xu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://portal.acm.org/citation.cfm?id=1786816.1787181&amp;coll=Portal&amp;dl=GUIDE&amp;CFID=92176503&amp;CFTOKEN=39358289">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yang Su,Zhijie Xu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a4486be2-fdef-4db0-89b0-879b296f6681</GUID>
        <Name>GPU Computing for Atmospheric Modeling Experience with a small kernel and implications for a full model</Name>
        <ShortDescription>Much success has been achieved using GPUs to accelerate existing applications that are highly data parallel, or that are dominated by small, intense computational kernels. What are the prospects for porting existing large scientific models that do not fit this mold? We take an expensive routine from the CAM atmosphere model, and port it to a GPU using CUDA. We use the experience gained as a guide in thinking about porting the full application to an accelerator based system. We consider the best path forward for getting large scientific models running on accelerator based systems, and identify cases where porting may be feasible, and where a complete redesign may be the best option.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1166_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1166_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>National Center for Atmospheric Research</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>04</ReleaseDay>
        <ReleaseDateDisplay>02/04/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Rory Kelly</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5406490&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Rory Kelly</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ce8e0150-e62b-4a0c-890b-0442b2e058a6</GUID>
        <Name>Design and Performance Evaluation of Image Processing Algorithms on GPUs</Name>
        <ShortDescription>In this paper, we construe key factors in design and evaluation of image processing algorithms on the massive parallel GPU (graphics processing units) using the CUDA (compute unified device architecture) programming model. A set of metrics, customized for image processing, are proposed to quantitatively evaluate algorithm characteristics. In addition, we show that a range of image processing algorithms map readily to CUDA using multiview stereo matching, linear feature extraction, JPEG2000 image encoding, and non-photorealistic rendering (NPR) as our example applications.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5477417&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1165_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1165_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Inha University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>06/03/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">In Kyu Park</Author>
           <Author email="">Nitin Singhal</Author>
           <Author email="">Man Hee Lee</Author>
           <Author email="">Sungdae Cho</Author>
           <Author email="">Chris Kim</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5477417&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>In Kyu Park,Nitin Singhal,Man Hee Lee</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>750d6910-dfb0-4eee-9c8e-cec1320d7f09</GUID>
        <Name>CUDA-Based Linear Solvers for Stable Fluids</Name>
        <ShortDescription>In the field of computer graphics, physically-based fluids simulations (i.e., simulations that solve the equations that govern fluids behaviour) are performed using, among others, Stam's stable fluids method. This method requires the solution of two sparse linear systems that can be solved using an iterative solver (e.g., Jacobi, Gauss-Seidel, conjugate gradient, etc.). Focusing on real-time 3D applications, we provide and analyze the performance of the parallel GPU-based (using CUDA) algorithms of the Jacobi, Gauss-Seidel, and conjugate gradient solvers.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1164_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1164_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName></OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>04/21/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Goncalo Amador</Author>
           <Author email="">Abel Gomes</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5480268&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Goncalo Amador,Abel Gomes</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d43dc906-8a5f-4bb4-bcb2-006f2d9be085</GUID>
        <Name>Implementation of Variable Preconditioned GCR with mixed precision on GPU using CUDA </Name>
        <ShortDescription>The Variable Preconditioned GVR (VPGCR) with mixed precision on Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA) is numerically investigated. The convergence theorem of VPGCR is guaranteed that the residual equation for the preconditioned procedure can be solved in the range of single precision operation. The results of computations show that VPGCR with mixed precision operation on GPU demonstrated significant achievement than that of CPU. Especially, VPGCR on GPU with mixed precision operation is 22.53 times faster than that of Central Processing Unit (CPU).</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1163_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1163_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Tokyo University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>09</ReleaseDay>
        <ReleaseDateDisplay>05/09/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Soichiro Ikuno</Author>
           <Author email="">Norihisa Fujita</Author>
           <Author email="">Susumu Yamamoto</Author>
           <Author email="">Susumu Nakata</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5481534&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Soichiro Ikuno,Norihisa Fujita,Susumu Yamamoto</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>55d72cf8-150e-4281-be46-1a00cd588e1e</GUID>
        <Name>A CUDA-Based Implementation of Stable Fluids in 3D with Internal and Moving Boundaries</Name>
        <ShortDescription>Fluid simulation has been an active research field in computer graphics for the last 30 years. Stam's stable fluids method, among others, is used for solving the equations that govern fluids (i.e. Navier-Stokes equations). An implementation of stable fluids in 3D using NVIDIA Compute Unified Architecture, shortly CUDA, is provided in this paper. This CUDA-based implementation also features the accurate physical treatment of internal (i.e. static boundaries inside the simulation domain) and moving boundaries. The performance gains of the presented implementation vs a sequential CPU-based implementation, and points of further improvement are also addressed.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1161_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1161_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName></OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>23</ReleaseDay>
        <ReleaseDateDisplay>03/23/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Goncalo Amador</Author>
           <Author email="">Abel Gomes</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5476624&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Goncalo Amador,Abel Gomes</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d41f4ce3-8c86-4dd4-835b-8954c9caef44</GUID>
        <Name>Hybrid Core Acceleration of UWB SIRE Radar Signal Processing</Name>
        <ShortDescription>To move High Performance Computing (HPC) closer to forward operating environments and missions, the Army Research Laboratory is developing approaches using hybrid, asymmetric core computing. By blending capabilities found in Graphics Processing Units (GPUs) and traditional von Neumann multi-core Central Processing Units (CPUs), approaches are being developed and optimized to provide at or near real-time processing speeds for research project applications. Algorithms are designed to partition work to resources best designed to handle the processing load. The use of commodity resources allows the design to be flexible throughout the life-cycle without the costly and time-consuming delays associated with Application Specific Integrated Circuit (ASIC) development. This paradigm allows for rapid technology transfer to end users.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5477419&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1160_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1160_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>U. S. Army Researc Laboratory</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>06/03/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Song Jun Park</Author>
           <Author email="">James Ross</Author>
           <Author email="">Dale Shires</Author>
           <Author email="">David Richie</Author>
           <Author email="">Brian Henz</Author>
           <Author email="">Lam Nguyen</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5477419&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Song Jun Park,James Ross,Dale Shires</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>78482823-4944-4b0a-ab95-15e1aad00454</GUID>
        <Name>Optimal loop unrolling for GPGPU programs</Name>
        <ShortDescription>Graphics Processing Units (GPUs) are massively parallel, many-core processors with tremendous computational power and very high memory bandwidth. With the advent of general purpose programming models such as NVIDIA's CUDA and the new standard OpenCL, general purpose programming using GPUs (GPGPU) has become very popular. However, the GPU architecture and programming model have brought along with it many new challenges and opportunities for compiler optimizations. One such classical optimization is loop unrolling. Current GPU compilers perform limited loop unrolling. In this paper, we attempt to understand the impact of loop unrolling on GPGPU programs.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470423&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1159_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1159_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>The Ohio State University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Giridhar Murthy Sreenivasa</Author>
           <Author email="">Mahesh Ravishankar</Author>
           <Author email="">Muthu Manikandan Baskaran</Author>
           <Author email="">P. Sadayappan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470423&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Giridhar Murthy Sreenivasa,Mahesh Ravishankar,Muthu Manikandan Baskaran</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>86fc0781-21d5-4b31-9fb0-7061d02a703b</GUID>
        <Name>Using CUDA enabled FDTD simulations to solve multi-gigahertz EMI challenges</Name>
        <ShortDescription>Thanks to the application of GPU-CUDA acceleration technology to EM simulation tools, more and more complicated EMI challenges can be efficiently investigated and solved very early in the design process. This paper presents a novel methodology to predict EMI emission due to memory SSO noise from a real, commercial graphics card by means of a commercially available CUDA accelerated full-wave FDTD simulator. It is shown that thanks to the CUDA acceleration one can estimate the influence of on-board decoupling capacitors on the EMI emission within hours.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1158_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1158_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>KHBO Flanders Mechatronics Engineering Centre</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>04/12/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Davy Pissoort</Author>
           <Author email="">Chen Wang</Author>
           <Author email="">Hany Fahmy</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5475519&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Davy Pissoort,Chen Wang,Hany Fahmy</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>860722bd-ba53-4391-9e5c-7197a5574713</GUID>
        <Name>Dynamic load balancing on single- and multi-GPU systems</Name>
        <ShortDescription>The computational power provided by many-core graphics processing units (GPUs) has been exploited in many applications. The programming techniques currently employed on these GPUs are not sufficient to address problems exhibiting irregular, and unbalanced workload. The problem is exacerbated when trying to effectively exploit multiple GPUs concurrently, which are commonly available in many modern systems. In this paper, we propose a task-based dynamic load-balancing solution for single-and multi-GPU systems. The solution allows load balancing at a finer granularity than what is supported in current GPU programming APIs, such as NVIDIA's CUDA.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470413&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1157_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1157_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Delaware</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Long Chen</Author>
           <Author email="">Oreste Villa</Author>
           <Author email="">Sriram Krishnamoorthy</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470413&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Long Chen,Oreste Villa,Sriram Krishnamoorthy</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8bc36492-7c2b-4e62-ba6d-a9664ee84f10</GUID>
        <Name>Automatic Generation of Multi-Core Chemical Kernels</Name>
        <ShortDescription>This work presents KPPA (the Kinetics PreProcessor: Accelerated), a general analysis and code generation tool that achieves significantly reduced time-to-solution for chemical kinetics kernels on three multi-core platforms: NVIDIA GPUs using CUDA, the Cell Broadband Engine, and Intel Quad-Core Xeon CPUs. A comparative performance analysis of chemical kernels from WRF-Chem and the Community Multiscale Air Quality Model (CMAQ) is presented for each platform in double and single precision on coarse and fine grids.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5473221&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1156_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1156_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Virginia Polytechnic Institute and State University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>27</ReleaseDay>
        <ReleaseDateDisplay>05/27/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">John Linford</Author>
           <Author email="">John Michalakes</Author>
           <Author email="">Manish Vachharajani</Author>
           <Author email="">Adrian Sandu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5473221&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>J. Linford,J. Michalakes,M. Vachharajani</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f0ed0b4c-5d78-4283-b786-d977d462b699</GUID>
        <Name>Speculative execution on multi-GPU systems</Name>
        <ShortDescription>The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of modern systems and the ability of applications to exploit them. Emerging programming models, such as CUDA and OpenCL, force developers to explicitly partition applications into components (kernels) and assign them to accelerators in order to utilize them effectively. An accelerator is a processor with a different ISA and micro-architecture than the main CPU. These static partitioning schemes are effective when targeting a system with only a single accelerator.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470427&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1155_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1155_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Georgia Institute of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Gregory Diamos</Author>
           <Author email="">Sudhakar Yalamanchili</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470427&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Gregory Diamos,Sudhakar Yalamanchili</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>829884ae-a849-4ab7-a5db-1ebb4290798a</GUID>
        <Name>AUTO-GC: Automatic translation of data mining applications to GPU clusters</Name>
        <ShortDescription>Because of the very favorable price to performance ratio of the GPUs, a popular parallel programming configuration today is a cluster of GPUs. However, extracting performance on such a configuration would typically require programming in both MPI and CUDA, thus requiring a high degree of expertise and effort. It is clearly desirable to be able to support higher-level programming of this emerging high-performance computing platform.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1154_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1154_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>The Ohio State University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Wenjing Ma</Author>
           <Author email="">Gagan Agrawal</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470883&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26pageNumber%3D2%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Wenjing Ma,Gagan Agrawal</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4401d416-56a2-4a55-88a0-a8ccbb66c75d</GUID>
        <Name>Pricing of cross-currency interest rate derivatives on Graphics Processing Units</Name>
        <ShortDescription>We present a Graphics Processing Unit (GPU) parallelization of the computation of the price of cross-currency interest rate derivatives via a Partial Differential Equation (PDE) approach. In particular, we focus on the GPU-based parallel computation of the price of long-dated foreign exchange interest rate hybrids, namely Power Reverse Dual Currency (PRDC) swaps with Bermudan cancelable features. We consider a three-factor pricing model with foreign exchange skew which results in a time-dependent parabolic PDE in three spatial dimensions. Finite difference methods on uniform grids are used for the spatial discretization of the PDE, and the Alternating Direction Implicit (ADI) technique is employed for the time discretization.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470708&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1153_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1153_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Toronto</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Duy Minh Dang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470708&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Duy Minh Dang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fb5436fc-e3e8-4c43-b70a-2dc6ba8e4f18</GUID>
        <Name>Study on GPU-accelerated extraction of interconnects parasitic using CUDA and MPI</Name>
        <ShortDescription>Parallel computation is application-oriented, particularly for the GPU (Graphics Processing Unit) with the inherent parallelism. This paper shows the architecture of a GPU cluster based on MPI (Message Passing Interface) and CUDA (Compute Unified Device Architecture). Results show that the acceleration ratio is obviously improved but the acceleration effect seems decelerated in large-scale GPU cluster. The parallel algorithm is mainly focused on task partitioning sparse matrix-vector multiplications (SpVM) in GPUs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1151_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1151_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Chinese Academy of Sciences</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>09</ReleaseDay>
        <ReleaseDateDisplay>05/09/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Xiaoyu Xu</Author>
           <Author email="">Guoqiang Liu</Author>
           <Author email="">Hui Qu</Author>
           <Author email="">Wei Xu</Author>
           <Author email="">Yang Zhang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5481435&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Xiaoyu Xu,Guoqiang Liu,Hui Qu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>eb22bb0c-f56a-47bc-8b54-6bc2fa978435</GUID>
        <Name>Performance study of mapping irregular computations on GPUs</Name>
        <ShortDescription>Recently, Graphical Processing Units (GPUs) have become increasingly more capable and well-suited to general purpose applications. As a result of the GPUs high degree of parallelism and computational power, there has been a great deal of interest directed toward the platform for parallel application development. Much of the focus, however, has been on very regular applications that exhibit a high degree of data parallelism, as these applications map well to the GPU. Irregular applications, such as the Breadth First Search discussed in this paper, have not been as extensively studied and are more difficult to implement in an efficient fashion on the GPU. We will present both an implementation of the Breadth First Search algorithm as well as that of a Matrix Parenthesization algorithm.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470770&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1150_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1150_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Manitoba</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Steven Solomon</Author>
           <Author email="">Parimala Thulasiraman</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470770&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Steven Solomon,Parimala Thulasiraman</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>2aa51865-7e62-41be-8adf-a461a0ae58d7</GUID>
        <Name>Design and implementation of MPEG audio layer III decoder using graphics processing units</Name>
        <ShortDescription>This paper describes a new implemented method for the MPEG audio layer III (MP3) decoder. The proposed architecture is based on a graphic process unit (GPU) using CUDA environment, where it can effectively take advantage of modern GPU's parallel computing power. The implemented system with this architecture employs a multi-thread model and memory optimization to process MP3 decoding in parallel, so it is significant to minimize the computational overhead. Experimental results on a GTX260+ graphics card showed that the proposed architecture is over five times faster than traditional MP3 library based on CPU.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1148_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1148_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Chinese Academy of Sciences</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>09</ReleaseDay>
        <ReleaseDateDisplay>04/09/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Chen Xiaoliang</Author>
           <Author email="">Zheng Chengshi</Author>
           <Author email="">Ma Longhua</Author>
           <Author email="">Cheng Xiaobin</Author>
           <Author email="">Li Xiaodong</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5476071&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Chen Xiaoliang ,Zheng Chengshi ,Ma Longhua </Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0e47069d-420f-4d76-9565-d04f4341f8d2</GUID>
        <Name>Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-Hybrid platforms</Name>
        <ShortDescription>The maximum flow problem is a fundamental graph theory problem with many important applications. Max-flow algorithms based on the push-relabel method are known to have better complexity bound and faster practical execution speed than others. However, existing push-relabel algorithms are designed for uniprocessors or parallel processors that support locking primitives, thus making it very difficult to apply the push-relabel technique to CUDA-based GPUs. In this paper, we present a first generic parallel push-relabel algorithm for CUDA devices. We model the parallelization efficiency of the algorithm, which reveals that, for a given input graph, the level of parallelism varies during the execution of the algorithm.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470401&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1147_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1147_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Georgia Institute of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Zhengyu He</Author>
           <Author email="">Bo Hong</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470401&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Zhengyu He,Bo Hong</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a23a06cf-4a9e-46f4-9171-718369793c99</GUID>
        <Name>An auto-tuning framework for parallel multicore stencil computations</Name>
        <ShortDescription>Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural resources, it has hitherto been limited to single kernel instantiations; in addition, the large variety of stencil kernels used in practice makes this computation pattern difficult to assemble into a library. This work presents a stencil auto-tuning framework that significantly advances programmer productivity by automatically converting a straightforward sequential Fortran 95 stencil expression into tuned parallel implementations in Fortran, C, or CUDA, thus allowing performance portability across diverse computer architectures, including the AMD Barcelona, Intel Nehalem, Sun Victoria Falls, and the latest NVIDIA GPUs.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470421&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1146_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1146_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>CRD/NERSC, Lawrence Berkeley National Laboratory </OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Shoaib Kamil</Author>
           <Author email="">Cy Chan</Author>
           <Author email="">Leonid Oliker</Author>
           <Author email="">John Shalf</Author>
           <Author email="">Samuel Williams</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470421&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Shoaib Kamil,Cy Chan,Leonid Oliker</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5448d0e9-eebf-4e90-8c29-59f7f4c224c1</GUID>
        <Name>Comparing Hardware Accelerators in Scientific Applications: A Case Study</Name>
        <ShortDescription>Multi-core processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing a Quantum Monte Carlo application. We compare the application's performance and programmability on a variety of platforms including CUDA with Nvidia GPUs, Brook+ with ATI graphics accelerators, OpenCL running on both multi-core and graphics processors, C++ running on multi-core processors, and a VHDL implementation running on a Xilinx FPGA. </ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5482576&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1145_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1145_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Tennessee</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>06</ReleaseDay>
        <ReleaseDateDisplay>06/06/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">R. Weber</Author>
           <Author email="">A. Gothandaraman</Author>
           <Author email="">R. Hinde</Author>
           <Author email="">G. Peterson</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5482576&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>R. Weber,A. Gothandaraman,R. Hinde</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1aa1156a-6f55-4e59-93bb-4cf5c1a6b6a2</GUID>
        <Name>Demystifying GPU Microarchitecture Through Microbenchmarking</Name>
        <ShortDescription>Graphics processors (GPU) offer the promise of more than an order of magnitude speedup over conventional processors for certain non-graphics computations. Because the GPU is often presented as a C-like abstraction (e.g., Nvidia's CUDA), little is known about the characteristics of the GPU's architecture beyond what the manufacturer has documented. This work develops a microbechmark suite and measures the CUDA-visible architectural characteristics of the Nvidia GT200 (GTX280) GPU. Various undisclosed characteristics of the processing elements and the memory hierarchies are measured.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5452013&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1144_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1144_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Toronto</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>03/28/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">H. Wong</Author>
           <Author email="">M. Papadopoulou</Author>
           <Author email="">M. Sadooghi-Alvandi</Author>
           <Author email="">A. Moshovos</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5452013&amp;queryText%3Dcuda+2010%26openedRefinements%3D*%26sortType%3Ddesc_Publication+Year%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>H. Wong,M. Papadopoulou,M. Sadooghi-Alvandi</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>56a2b66b-15bc-46e7-bce1-2f1b937dfe11</GUID>
        <Name>SelfAudience</Name>
        <ShortDescription>Audience Measurement - real time video analysis for counting people, face detection and tracking</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1143_428449_selfadvert_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1143_428449_selfadvert_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>SelfAdvert</OrganizationName>
        <OrganizationURL>http://www.selfadvert.com</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>05/15/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>300</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="sales@selfadvert.com">SelfAdvert</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.selfadvert.com/downloads.php">Application</ContentType>
           <ContentType url="http://www.youtube.com/watch?v=8dQVaMVJWDE">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>freeware audience measurement,SelfAdvert,sales@selfadvert.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>41b685bb-4e0a-49ba-af2f-b938f11bae36</GUID>
        <Name>Cellular Automata Evolver</Name>
        <ShortDescription>Evolver of Cellular Automata 1D rules plus inference tools with the state of the art technology</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1142_caev2_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1142_caev2_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Cellular Automata Evolver</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>06/02/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="a.denis1@yahoo.com">Denis Antiga</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://content.wuala.com/contents/CAEVolve/index.html">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Cellular Automata,Denis Antiga,a.denis1@yahoo.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a71cdec6-989a-487e-bfe7-36278925ca5d</GUID>
        <Name>Statistical constraints on binary black hole inspiral dynamics</Name>
        <ShortDescription>We perform a statistical analysis of the binary black hole problem in the post-Newtonian approximation by systematically sampling and evolving the parameter space of initial configurations for quasi-circular inspirals. Through a principal component analysis of spin and orbital angular momentum variables we systematically look for uncorrelated quantities and find three of them which are highly conserved in a statistical sense, both as functions of time and with respect to variations in initial spin orientations.</ShortDescription>
        <URL>http://arxiv.org/abs/1005.5560</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1141_bh_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1141_bh_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Maryland</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>05/30/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>50</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="tiglio@umd.edu">Chad Galley</Author>
           <Author email="">Frank Herrmann</Author>
           <Author email="">John Silberholz</Author>
           <Author email="">Manuel Tiglio</Author>
           <Author email="">Gustavo Guerberoff</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://arXiv.org/pdf/1005.5560">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Chad Galley,Frank Herrmann,John Silberholz, Manuel Tiglio, Gustavo Guerberoff,tiglio@umd.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f5c5c329-3a57-4d10-a40d-475b6d59423c</GUID>
        <Name>Object-oriented stream programming using aspects</Name>
        <ShortDescription>High-performance parallel programs that efficiently utilize heterogeneous CPU+GPU accelerator systems require tuned coordination among multiple program units. However, using current programming frameworks such as CUDA leads to tangled source code that combines code for the core computation with that for device and computational kernel management, data transfers between memory spaces, and various optimizations. In this paper, we propose a programming system based on the principles of Aspect-Oriented Programming, to un-clutter the code and to improve programmability of these heterogeneous parallel systems. Specifically, we use standard C++ to describe the core computations and aspects to encapsulate all other support parts.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470472&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26pageNumber%3D2%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1140_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1140_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Rutgers University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Mingliang Wang</Author>
           <Author email="">Manish Parashar</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470472&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26pageNumber%3D2%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mingliang Wang,Manish Parashar</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>60a13ea4-55ee-4aa3-9fbe-2d1ee29bca6c</GUID>
        <Name>The GPU Computing Era</Name>
        <ShortDescription>GPU computing is at a tipping point, becoming more widely used in demanding consumer applications and high-performance computing. This article describes the rapid evolution of GPU architectures-from graphics processors to massively parallel many-core multiprocessors, recent developments in GPU computing architectures, and how the enthusiastic adoption of CPU+GPU coprocessing is accelerating parallel applications.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1139_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1139_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>NVIDIA</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>03/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">J. Nickolls</Author>
           <Author email="">W. J. Dally</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5446251&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>J. Nickolls,W. J. Dally</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d5846aba-4896-45b5-b4d7-371b91ef56e5</GUID>
        <Name>Fast implementation of Wyner-Ziv Video codec using GPGPU</Name>
        <ShortDescription>In this paper, we report a fast implementation of Wyner-Ziv video decoder using general-purpose computing on graphics processing units (GPGPU). Despite of its many advantages, Wyner-Ziv video coding has a problem of huge decoding complexity. Since Slepian-Wolf decoding with rate adaptive LDPC accumulate code takes up more than 90% of entire Wyner-Ziv video decoding complexity, in this paper, we focus on fast implementation of the Slepian-Wolf decoder using the CUDA (Compute Unified Device Architecture) which is a GPGPU architecture developed by NVIDIA. Our implementation is shown to be 4 5 times (QCIF size) or 15 20 times (CIF size) faster compared to conventional Slepian-Wolf decoding.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1138_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1138_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Sungkyunkwan University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>03/24/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>20</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Ryanggeun Oh</Author>
           <Author email="">Jongbing Park</Author>
           <Author email="">Byeungwoo Jeon</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5463150&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ryanggeun Oh,Jongbing Park,Byeungwoo Jeon</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b37872a6-2751-4cb1-9d68-64b433ae6da1</GUID>
        <Name>Efficient parallel algorithms for maximum-density segment problem</Name>
        <ShortDescription>One of the fundamental problems involving DNA sequences is to find high density segments of certain widths, for example, those regions with intensive guanine and cytosine (GC). Formally, given a sequence, each element of which has a value and a width, the maximum-density segment problem asks for the segment with the maximum density while satisfying minimum and possibly maximum width constraints. While several linear-time sequential algorithms have emerged recently due to its primitive-like utility, to our knowledge, no nontrivial parallel algorithm has yet been proposed for this topical problem.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470390&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1136_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1136_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Georgia State University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Xue Wang</Author>
           <Author email="">Fasheng Qiu</Author>
           <Author email="">Sushil K. Prasad</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470390&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Xue Wang,Fasheng Qiu,Sushil K. Prasad</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ad4bb0eb-1147-4ed3-ae05-d7d49cb8d9b4</GUID>
        <Name>Fast binding site mapping using GPUs and CUDA</Name>
        <ShortDescription>Binding site mapping refers to the computational prediction of the regions on a protein surface that are likely to bind a small molecule with high affinity. The process involves flexibly docking a variety of small molecule probes and finding a consensus site that binds most of those probes. Due to the computational complexity of flexible docking, the process is often split into two steps: the first performs rigid docking between the protein and the probe; the second models the side chain flexibility by energy-minimizing the (few thousand) top scoring protein-probe complexes generated by the first step. Both these steps are computationally very expensive, requiring many hours of runtime per probe on a serial CPU.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470895&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1134_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1134_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Boston University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Bharat Sukhwani</Author>
           <Author email="">Martin C. Herbordt</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470895&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bharat Sukhwani,Martin C. Herbordt</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4fb1d9f2-99b7-4237-b4f0-46e4bf9cf25a</GUID>
        <Name>Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead</Name>
        <ShortDescription>Sorting is a well-investigated topic in Computer Science in general and by now many efficient sorting algorithms for CPUs and GPUs have been developed. There is no swapping, paging, etc. available on GPUs to provide more virtual memory than physically available, thus if one wants to sort sequences that exceed GPU memory using the GPU the problem of external sorting arises.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1133_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1133_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Christian-Albrechts-University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Hagen Peters</Author>
           <Author email="">Ole Schulz-Hildebrandt</Author>
           <Author email="">Norbert Luttenberger</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470833&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Hagen Peters,Ole Schulz-Hildebrandt,Norbert Luttenberger</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5668b6fb-9541-43d3-b426-da3fbc93395c</GUID>
        <Name>A tile-based parallel Viterbi algorithm for biological sequence alignment on GPU with CUDA</Name>
        <ShortDescription>The Viterbi algorithm is the compute-intensive kernel in Hidden Markov Model (HMM) based sequence alignment applications. In this paper, we investigate extending several parallel methods, such as the wave-front and streaming methods for the Smith-Waterman algorithm, to achieve a significant speed-up on a GPU. The wave-front method can take advantage of the computing power of the GPU but it cannot handle long sequences because of the physical GPU memory limit. On the other hand, the streaming method can process long sequences but with increased overhead due to the increased data transmission between CPU and GPU. To further improve the performance on GPU, we propose a new tile-based parallel algorithm. We take advantage of the homological segments to divide long sequences into many short pieces and each piece pair (tile) can be fully held in the GPU's memory.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470903&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1132_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1132_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Tsinghua University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Zhihui Du</Author>
           <Author email="">Zhaoming Yin</Author>
           <Author email="">David A. Bader</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470903&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Zhihui Du,Zhaoming Yin,David A. Bader</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a2642850-909e-475e-bd95-fcb458609914</GUID>
        <Name>Designing scalable many-core parallel algorithms for min graphs using CUDA</Name>
        <ShortDescription>Removing redundant edges on a large graph is a fundamental problem in many practical applications such as verification of real-time systems and network routing. In this paper, we present the designs of scalable and efficient parallel algorithms for multiple many-core GPU devices using CUDA. Our algorithms expose substantial fine-grained parallelism while maintaining minimal global communication. By using the global scope of the GPU's global memory, coalescing the global memory reads and writes, and avoiding on-chip shared memory bank conflicts, we are able to achieve a large performance benefit with a speed-up of 2,500x on a desktop computer in comparison with a single core CPU program. We report our experiments on large graphs with up to 29K vertices using multiple GPU devices.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1131_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1131_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Lamar University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Quoc-Nam Tran</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470712&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Quoc-Nam Tran</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ebaed060-0f46-49b2-b9b3-5bfbb7e21ff5</GUID>
        <Name>Implementing the Himeno benchmark with CUDA on GPU clusters</Name>
        <ShortDescription>This paper describes the use of CUDA to accelerate the Himeno benchmark on clusters with GPUs. The implementation is designed to optimize memory bandwidth utilization. Our approach achieves over 83% of the theoretical peak bandwidth on a NVIDIA Tesla C1060 GPU and performs at over 50 GFlops. A multi-GPU implementation that utilizes MPI alongside CUDA streams to overlap GPU execution with data transfers allows linear scaling and performs at over 800 GFlops on a cluster with 16 GPUs. The paper presents the optimizations required to achieve this level of performance.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1130_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1130_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>NVIDIA</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Everett H. Phillips</Author>
           <Author email="">Massimiliano Fatica</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470394&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Everett H. Phillips,Massimiliano Fatica</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>adad926b-73e9-492a-a3d5-69301fb1d791</GUID>
        <Name>CUDA-based AES parallelization with fine-tuned GPU memory utilization</Name>
        <ShortDescription>Current Graphics Processing Unit (GPU) presents large potentials in speeding up computationally intensive data parallel applications over traditional parallelization approaches since there are much more hardware threads inside GPUs than the computational cores available to common CPU threads. NVIDIA developed a generic GPU programming platform, CUDA, which allows programmers to utilize GPU through C programming language and parallelize applications in a similar way as in traditional multithreading approach. However, not all applications are suitable for this new platform. </ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470766&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1129_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1129_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Arkansas State University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Chonglei Mei</Author>
           <Author email="">Hai Jiang</Author>
           <Author email="">Jeff Jenness</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470766&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Chonglei Mei,Hai Jiang,Jeff Jenness</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>59dd2453-f984-4ab7-9a4c-49d2350b0f09</GUID>
        <Name>Optimization of linked list prefix computations on multithreaded GPUs using CUDA</Name>
        <ShortDescription>We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures involve in general highly irregular fine grain memory accesses that are typical of many computations on linked lists, trees, and graphs. While the current generation of GPUs provides substantial computational power and extremely high bandwidth memory accesses, they may appear at first to be primarily geared toward streamed, highly data parallel computations. In this paper, we introduce an optimized multithreaded GPU algorithm for prefix computations through a randomization process that reduces the problem to a large number of fine-grain computations.</ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470455&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1128_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1128_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Maryland</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Zheng Wei</Author>
           <Author email="">Joseph JaJa</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470455&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Zheng Wei,Joseph JaJa</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9ecfd491-9553-4775-b59e-87718a3593fc</GUID>
        <Name>Parallel computing with CUDA</Name>
        <ShortDescription>NVIDIA's CUDA architecture provides a powerful platform for writing highly parallel programs. By providing simple abstractions for hierarchical thread organization, memories, and synchronization, the CUDA programming model allows programmers to write scalable programs without the burden of learning a multitude of new programming constructs. </ShortDescription>
        <URL>http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470378&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1127_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1127_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>NVIDIA</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>04/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Michael Garland</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&amp;arnumber=5470378&amp;queryText%3Dcuda%26searchWithin%3D2010%26openedRefinements%3D*%26searchField%3DSearch+All">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Michael Garland</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7473646a-8f1b-4e8b-b578-cabe90a66678</GUID>
        <Name>Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs</Name>
        <ShortDescription>In this paper we describe techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Programs developed for manycore processors typically express finer thread-level parallelism than is appropriate for multicore platforms. We describe options for implementing fine-grained threading in software, and find that reasonable restrictions on the synchronization model enable significant optimizations and performance improvements over a baseline approach. We evaluate these techniques in a production-level compiler and runtime for the CUDA programming model targeting modern CPUs.</ShortDescription>
        <URL>http://portal.acm.org/citation.cfm?id=1772954.1772971&amp;coll=Portal&amp;dl=ACM&amp;CFID=91959390&amp;CFTOKEN=70859630</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1126_logo_acm_portal2_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1126_logo_acm_portal2_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>NVIDIA Corporation</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>04/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">John A Stratton</Author>
           <Author email="">Vinod Grover</Author>
           <Author email="">Jaydeep Marathe</Author>
           <Author email="">Bastiaan Aarts</Author>
           <Author email="">Mike Murphy</Author>
           <Author email="">Ziang Hu</Author>
           <Author email="">Wen-mei W. Hwu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://portal.acm.org/citation.cfm?id=1772954.1772971&amp;coll=Portal&amp;dl=ACM&amp;CFID=91959390&amp;CFTOKEN=70859630">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>John A Stratton,Vinod Grover,Jaydeep Marathe</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7702a523-e58c-4e1e-8e05-207e1430c47c</GUID>
        <Name>Non-blocking programming on multi-core graphics processors: extended asbtract</Name>
        <ShortDescription>This paper investigates the synchronization power of coalesced memory accesses, a family of memory access mechanisms introduced in recent large multicore architectures like the CUDA graphics processors. We first design three memory access models to capture the fundamental features of the new memory access mechanisms. Subsequently, we prove the exact synchronization power of these models in terms of their consensus numbers. These tight results show that the coalesced memory access mechanisms can facilitate strong synchronization between the threads of multicore processors, without the need of synchronization primitives other than reads and writes.</ShortDescription>
        <URL>http://portal.acm.org/citation.cfm?id=1556444.1556448&amp;coll=Portal&amp;dl=ACM&amp;CFID=91959390&amp;CFTOKEN=70859630</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1125_logo_acm_portal2_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1125_logo_acm_portal2_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Tromse</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>06/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Phuong Hoai Ha</Author>
           <Author email="">Philippas Tsigas</Author>
           <Author email="">Otto J. Anshus</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://portal.acm.org/citation.cfm?id=1556444.1556448&amp;coll=Portal&amp;dl=ACM&amp;CFID=91959390&amp;CFTOKEN=70859630">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Phuong Hoai Ha,Philippas Tsigas,Otto J. Anshus</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>018fa5a9-ae5d-498c-87ba-c505061b01c5</GUID>
        <Name>Application-guided tool development for architecturally diverse computation</Name>
        <ShortDescription>Architecturally diverse computation exploits non-traditional computing platforms (e.g., field-programmable gate arrays, graphics processors, heterogeneous chip multiprocessors) to execute user applications. We have designed the Auto-Pipe tool set with the goal of easing the task of developing applications for architecturally diverse systems. Prior to and during the course of Auto-Pipe's design, we have developed a number of real, substantial applications, and the the lessons learned during the development of these applications has had a direct bearing on the capabilities of Auto-Pipe. In this paper, we describe the relationship between our application development experience and Auto-Pipe. In short, how have applications guided the tools' evolution and development?</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1124_logo_acm_portal2_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1124_logo_acm_portal2_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Washington University in St. Louis</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>03/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">R. D. Chamberlain</Author>
           <Author email="">J. Buhler</Author>
           <Author email="">M. Franklin</Author>
           <Author email="">J. H. Buckley</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://portal.acm.org/citation.cfm?id=1774088.1774191&amp;coll=Portal&amp;dl=GUIDE&amp;CFID=88119154&amp;CFTOKEN=11832401">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>R. D. Chamberlain,J. Buhler,M. Franklin</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7c408604-7b5b-4079-8a47-1aeb09371dde</GUID>
        <Name>NeuroSolutions CUDA Add-on</Name>
        <ShortDescription>The NeuroSolutions CUDA Add-on implements high performance parallel computing of Neural Networks using Levenberg-Marquardt - one of the most powerful form of back-propagation learning available. Neural Networks are a form of artificial intelligence (AI) that have proved to be effective in solving a wide range of data mining and data modeling problems including credit card fraud detection, cancer diagnosis and financial forecasting to name a few. As problems become more and more complex, so does the demand for processing power. By parallelizing advanced learning algorithms on a GPU (Graphics Processing Unit), NeuroSolutions can achieve up to 50 times greater performance than that of processing on a traditional CPU (Central Processing Unit). A free evaluation version of NeuroSolutions is available for download on our website.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1123_v6-nscuda-large_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1123_v6-nscuda-large_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>NeuroDimension, Inc.</OrganizationName>
        <OrganizationURL>http://www.nd.com</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>13</ReleaseDay>
        <ReleaseDateDisplay>05/13/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>50</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="info@nd.com">Gary Lynn</Author>
           <Author email="">Brian Kachnowski</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.nd.com/neurosolutions/download.html">Application</ContentType>
           <ContentType url="http://www.nd.com/neurosolutions/download.html">Presentation</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Finance</ApplicationType>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Medical Imaging</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Life Sciences</ApplicationType>
           <ApplicationType>Libraries</ApplicationType>
           <ApplicationType>Oil &amp; Gas</ApplicationType>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
           <ApplicationType>Neural Networks</ApplicationType>
           <ApplicationType>Data Mining</ApplicationType>
           <ApplicationType>Machine Learning</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>neural network, Levenberg-Marquardt, CUDA, Mutlilayer Perceptron, GPU, parallel processing,Gary Lynn,Brian Kachnowski,info@nd.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e1b2a932-da54-4115-9913-ef21d09b12cb</GUID>
        <Name>Bayesian Real-Time Perception Algorithms on GPU</Name>
        <ShortDescription>Real-time implementation of a Bayesian framework for robotic multisensory perception using the Compute Unified Device Architecture (CUDA).</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1122_1bayesoccupancyfilter_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1122_1bayesoccupancyf
