<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="applications.xsl"?>
  <Applications>	

     <Application>
        <GUID>e0663fcb-55fe-4b9b-886b-0c7305d789df</GUID>
        <Name>Exploring utilisation of GPU for database applications</Name>
        <ShortDescription>This study is devoted to exploring possible applications of GPU technology for acceleration of the database access. We use the n-gram based approximate text search engine as a test bed for GPU based acceleration algorithms. Two solutions - hybrid CPU/GPU and pure GPU algorithms for query processing are studied and compared with the baseline CPU algorithm as well as with the optimized versions of the CPU algorithm. The hybrid algorithm performs poorly on most queries and only modest acceleration is achievable for long queries with high error level. On the other hand speedups up to 18 times were achieved for pure GPU algorithm. Application of the GPU acceleration for more general data base problems is discussed. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1012_chemsearch_logo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1012_chemsearch_logo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw</OrganizationName>
        <OrganizationURL>http://www.icm.edu.pl</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>09</ReleaseDay>
        <ReleaseDateDisplay>03/09/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>18</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="S.Walkowiak@icm.edu.pl">S Walkowiak et al.</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://bioinfo.icm.edu.pl/algorithm/source/Nsearch/files/Walkowiak_etal_ICCS_2010_final.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Databases</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>S. Walkowiak,K. Wawruch,L. Ligowski,S.Walkowiak@icm.edu.pl,L.Ligowski@icm.edu.pl,W.Rudnicki@icm.edu.pl</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>09210fbd-d990-4fd2-a864-aeb2dc8291eb</GUID>
        <Name>Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed Precision Multigrid</Name>
        <ShortDescription>We have previously suggested mixed precision iterative solvers specifically tailored to the iterative solution of sparse linear equation systems as they typically arise in the finite element discretization of partial differential equations. These schemes have been evaluated for a number of hardware platforms, in particular single precision GPUs as accelerators to the general purpose CPU. This paper reevaluates the situation with new mixed precision solvers that run entirely on the GPU: We demonstrate that mixed precision schemes constitute a significant performance gain over native double precision. Moreover, we present a new implementation of cyclic reduction for the parallel solution of tridiagonal systems and employ this scheme as a line relaxation smoother in our GPU-based multigrid solver. With an alternating direction implicit variant of this advanced smoother we can extend the applicability of the GPU multigrid solvers to very ill-conditioned systems arising from the discretization on anisotropic meshes, that previously had to be solved on the CPU. The resulting mixed precision schemes are always faster than double precision alone, and outperform tuned CPU solvers consistently by almost an order of magnitude. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1011_TPDS_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1011_TPDS_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>TU Dortmund and Max Planck Institut Informatik</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>03/02/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="dominik.goeddeke@math.tu-dortmund.de">Dominik Göddeke</Author>
           <Author email="">Robert Strzodka</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.mathematik.tu-dortmund.de/~goeddeke/pubs/index.html#Goeddeke_2010_CRT">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Dominik Göddeke,Robert Strzodka,dominik.goeddeke@math.tu-dortmund.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a22255de-199f-41da-b3c7-703a9714f29f</GUID>
        <Name>Lattice-Boltzmann Simulation of the Shallow-Water Equations with Fluid-Structure Interaction on Multi- and Manycore Processors</Name>
        <ShortDescription>We present an efficient method for the simulation of laminar fluid flows with free surfaces including their interaction with moving rigid bodies, based on the two-dimensional shallow water equations and the Lattice-Boltzmann method. Our implementation targets multiple fundamentally different architectures such as commodity multicore CPUs with SSE, GPUs, the Cell BE and clusters. We show that our code scales well on an MPI-based cluster; that an eightfold speedup can be achieved using modern GPUs in contrast to multithreaded CPU code and, finally, that it is possible to solve fluid-structure interaction scenarios with high resolution at interactive rates.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1010_mcc-paper_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1010_mcc-paper_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>TU Dortmund</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>02/24/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>8</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="markus.geveler@math.tu-dortmund.de">Markus Geveler</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.mathematik.tu-dortmund.de/~goeddeke/pubs/index.html#Geveler_2010_LBS">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Markus Geveler,markus.geveler@math.tu-dortmund.de</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>095c041f-aa2c-472f-b0b4-eaa692951dc5</GUID>
        <Name>Fast Image Blurring with CUDA</Name>
        <ShortDescription>High performance and good quality of image blurring, using stack blurring algorithm provided by http://incubator.quasimondo.com</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1008_device_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1008_device_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>http://home.so-net.net.tw/lioucy/</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>09/10/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>300</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="lioucr@yahoo.ca">ChaoJui</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.codeproject.com/KB/graphics/blurringwithcuda.aspx">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>ChaoJui,lioucr@yahoo.ca</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>476d306d-6a77-4749-8210-8b7b19ebd420</GUID>
        <Name>Fast Human Detection with Cascaded Ensembles</Name>
        <ShortDescription>A real time implementation of the Histograms of Oriented Gradients algorithm with cascaded classifers.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1007_cover_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1007_cover_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>MIT</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>02/26/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>13</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="berkin@mit.edu">Berkin Bilgic</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://web.mit.edu/berkin/Public/berkin_thesis.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Berkin Bilgic,berkin@mit.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ba8b2f03-6170-4b26-99c3-86b4e47546d1</GUID>
        <Name>Cuda-Renderer 2009 - A Multi-Volume Polyhedral Renderer</Name>
        <ShortDescription>We present a new algorithm for hardware-accelerated ray casting of multiple volumes. Our approach supports a large number of volumes, complex translucent and concave polyhedral objects as well as CSG intersections of volumes and geometry in any combination. It is implemented as a software renderer in CUDA without any fixed function portions, which allows full control over the use of memory bandwidth. High depth complexity, which is problematic for conventional approaches based on depth peeling, can be successfully handled. As far as we know, our approach is the first framework for multi-volume rendering which provides interactive frame rates when concurrently rendering more than 50 arbitrarily overlapping volumes on current graphics hardware.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1006_Thumbnail_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1006_Thumbnail_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Graz University of Technology</OrganizationName>
        <OrganizationURL>http://www.icg.tugraz.at</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>12/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="kainz@icg.tugraz.at">Bernhard Kainz</Author>
           <Author email="">Markus Grabner</Author>
           <Author email="">Alexander Bornik</Author>
           <Author email="">Stefan Hauswiesner</Author>
          <Author email="">Judith Muehl</Author>
          <Author email="">Dieter Schmalstieg</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.youtube.com/watch?v=2Ym0CoJ0pVk">Multimedia</ContentType>
           <ContentType url="http://portal.acm.org/citation.cfm?id=1618498&amp;dl=ACM&amp;coll=portal&amp;CFID=484952565&amp;CFTOKEN=484952565">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bernhard Kainz,Markus Grabner,Alexander Bornik,kainz@icg.tugraz.at</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>aa417b5b-e0cc-446a-9fca-a93e14d4868b</GUID>
        <Name>Accelerating SQL Database Operations on a GPU with CUDA</Name>
        <ShortDescription>A reimplementation of portions of the SQLite database to execute on a GPU, part of the GPGPU-3 workshop.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1005_volcano_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1005_volcano_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Virginia (LAVA Lab)</OrganizationName>
        <OrganizationURL>http://www.cs.virginia.edu/~skadron/pub_list.html</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>03/14/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>70</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="pbb7c@virginia.edu">Peter Bakkum</Author>
           <Author email="skadron@virginia.edu">Kevin Skadron</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_gpgpu10.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Data Mining</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Peter Bakkum,pbb7c@virginia.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c6b19852-39b9-4460-8777-47047330ce20</GUID>
        <Name>Gramm-software package for molecular dynamics on graphical processing units </Name>
        <ShortDescription>This work describes the software package and algorithms for molecular dynamics using NVIDIA GPU G80, G84, and G92. All potentials needed for MM2 and AMBER force fields are implemented and the combination of different potentials is allowed. The performance comparison of different MD algorithms on GPU and CPU is presented. All software is available from www.gpamm.mntech.ru. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1003_cover-medium2_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1003_cover-medium2_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Russian Academy of Sciences</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>01/21/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">D. S. Tarasov</Author>
           <Author email="">E. D. Izotova</Author>
           <Author email="">D. A. Alisheva</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/98624210w828r30g/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>D. S. Tarasov,E. D. Izotova,D. A. Alisheva</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>de2ccfcc-9cf0-4076-b92e-969e42607064</GUID>
        <Name>Leveraging Computation Sharing and Parallel Processing in Location-Based Services</Name>
        <ShortDescription>A variety of research exists for the processing of continuous queries in large, mobile environments. Each method tries, in its own way, to address the computational bottleneck of constantly processing so many queries. In this paper, we introduce an efficient and scalable system for monitoring continuous queries by leveraging the parallel processing capability of the Graphics Processing Unit.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/CSE.2009.437</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1002_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1002_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 International Conference on Computational Science and Engineering</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>08/31/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Jonathan Cazalas</Author>
           <Author email="">Kien Hua</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CSE.2009.437">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jonathan Cazalas,Kien Hua</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5914c55d-b33d-4834-91f5-52968c66c450</GUID>
        <Name>Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors</Name>
        <ShortDescription>Lattice Boltzmann Methods (LBM) are used for the computational simulation of Newtonian fluid dynamics. LBM-based simulations are readily parallelizable; they have been implemented on general-purpose processors, field-programmable gate arrays (FPGAs), and graphics processing units (GPUs). Of the three methods, the GPU implementations achieved the highest simulation performance per chip. With memory bandwidth of up to 141 GB/s and a theoretical maximum floating point performance of over 600 GFLOPS, CUDA-ready GPUs from NVIDIA provide an attractive platform for a wide range of scientific simulations, including LBM.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ICPP.2009.38</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1001_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1001_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 International Conference on Parallel Processing</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>09/25/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Peter Bailey</Author>
           <Author email="">Joe Myre</Author>
           <Author email="">Stuart D.C. Walsh</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICPP.2009.38">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Peter Bailey,Joe Myre,Stuart D.C. Walsh</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9c638c6c-3e27-4a8c-b3ea-e6f75ca52f8d</GUID>
        <Name>Theoretical and Empirical Analysis of a GPU Based Parallel Bayesian Optimization Algorithm</Name>
        <ShortDescription>General Purpose computing over Graphical Processing Units (GPGPUs) is a huge shift of paradigm in parallel computing that promises a dramatic increase in performance. But GPGPUs also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges and design choices involved in parallelization of Bayesian Optimization Algorithm (BOA) to solve complex combinatorial optimization problems over nVidia commodity graphics hardware using Compute Unified Device Architecture (CUDA).</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/PDCAT.2009.32</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/1000_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/1000_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 International Conference on Parallel and Distributed Computing, Applications and Technologies</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>12/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Asim Munawar</Author>
           <Author email="">Mohamed Wahib</Author>
           <Author email="">Masaharu Munetomo</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/PDCAT.2009.32">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Asim Munawar,Mohamed Wahib,Masaharu Munetomo</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9c689f15-653f-4c90-b64c-1140bae9d5df</GUID>
        <Name>Applying Modern Soft- and Hardware Technologies for Computational Steering Approaches in Computational Fluid Dynamics</Name>
        <ShortDescription>In this article we present an educational simulation tool, FlowSim 2007 CUDA edition, a computational steering application for interactive 2D flow simulation based on the Lattice Boltzmann Method. The application combines a comfortable user interface as well as a convenient development platform on the one hand and a high performance flow solver on the other hand.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/CW.2007.53</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/999_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/999_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2007 International Conference on Cyberworlds</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>10/26/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Jan Linxweiler</Author>
           <Author email="">Jonas Tlke</Author>
           <Author email="">Manfred Krafczyk</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CW.2007.53">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jan Linxweiler,Jonas Tlke,Manfred Krafczyk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>722324b0-4ea9-4cc8-896d-190e61c0da21</GUID>
        <Name>High-Speed Implementations of Block Cipher ARIA Using Graphics Processing Units</Name>
        <ShortDescription>The power of graphics processing unit (GPU) has been increasing rapidly more than that of CPU. It is not surprising that many software libraries were developed??which enable us to use the power of GPU for general computations especially in parallel data processing. In this paper, we propose implementations of the standard block cipher ARIA of Korea using OpenGL and CUDA libraries on GPU.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/MUE.2008.94</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/998_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/998_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2008 International Conference on Multimedia and Ubiquitous Engineering</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>04/26/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yongjin Yeom</Author>
           <Author email="">Yongkuk Cho</Author>
           <Author email="">Moti Yung</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/MUE.2008.94">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yongjin Yeom,Yongkuk Cho,Moti Yung</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8fc3f95e-0465-479a-a4d9-6876d7b5e3b3</GUID>
        <Name>Accelerating Compute-Intensive Applications with GPUs and FPGAs</Name>
        <ShortDescription>Accelerators are special purpose processors designed to speed up compute-intensive sections of applications. Two extreme endpoints in the spectrum of possible accelerators are FPGAs and GPUs, which can often achieve better performance than CPUs on certain workloads. FPGAs are highly customizable, while GPUs provide massive parallel execution resources and high memory bandwidth.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/SASP.2008.4570793</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/997_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/997_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Virginia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>09</ReleaseDay>
        <ReleaseDateDisplay>06/09/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Shuai Che</Author>
           <Author email="">Jie Li</Author>
           <Author email="">Jeremy W. Sheaffer </Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/SASP.2008.4570793">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Shuai Che,Jie Li,Jeremy W. Sheaffer </Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>458661f9-8252-4e51-b2bd-d53126b80571</GUID>
        <Name>High-Speed Private Information Retrieval Computation on GPU</Name>
        <ShortDescription>A Private Information Retrieval (PIR) scheme is a protocol in which a user retrieves a record out of n from a replicated database, while hiding from the database which record has been retrieved, as long as the different replicas do not collude. A specially interesting sub-field of research, called single-database PIR, deals with the schemes that allow a user to retrieve privately an element of a non-replicated database. In these schemes, user privacy is related to the intractability of a mathematical problem, instead of being based on the assumption that different replicas exist and do not collude against their users. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/SECURWARE.2008.55</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/996_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/996_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2008 Second International Conference on Emerging Security Information, Systems and Technologies</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>08/31/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Carlos Aguilar Melchor</Author>
           <Author email="">Benoit Crespin</Author>
           <Author email="">Philippe Gaborit</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/SECURWARE.2008.55">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Carlos Aguilar Melchor,Benoit Crespin,Philippe Gaborit</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>23325740-fa2b-4e7b-a7ee-b607da12ee54</GUID>
        <Name>Compute Unified Device Architecture Application Suitability</Name>
        <ShortDescription>Graphics processing units (GPUs) can provide excellent speedups on some, but not all, general-purpose workloads. Using a set of computational GPU kernels as examples, the authors show how to adapt kernels to utilize the architectural features of a GeForce 8800 GPU and what finally limits the achievable performance.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/995_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/995_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Illinois</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>06/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Wen-Mei Hwu</Author>
           <Author email="">Christopher Rodrigues</Author>
           <Author email="">Shane Ryoo</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.48">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Wen-Mei Hwu,Christopher Rodrigues,Shane Ryoo</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0e7f8bf5-4959-4ece-9756-519dda1fe8b6</GUID>
        <Name>Parallel Approaches for SWAMP Sequence Alignment</Name>
        <ShortDescription>This document is a summary and overview of several approaches to implement the local sequence alignmentalgorithms known as SWAMP and SWAMP+ on commerciallyavailable hardware. Using a Smith-Waterman style of alignment, these parallel algorithms have several innovative extensions that take advantage of the ASC associative computing model while maintaining speed, accuracy, and producing a richer set of results in an automated way that is not currently available.We consider four different hardware architectures for therealization of the ASC model. These are the ClearSpeed CSXprocessor, NVIDIA GPGPU graphics processors, IBM Cell Processors, and FPGAs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/994_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/994_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Case Western University, Cleveland, Ohio</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>06/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Shannon Steinfadt</Author>
           <Author email="">Kevin Schaffer</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/OCCBIO.2009.12">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Shannon Steinfadt,Kevin Schaffer</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9d864e53-07ca-423a-ab86-088d470c12a2</GUID>
        <Name>Accelerating Algebraic Reconstruction Using CUDA-Enabled GPU</Name>
        <ShortDescription>In this paper, we apply the Compute Unified Device Architecture (CUDA) to the 3D cone-beam CT reconstruction using Simultaneous Algebraic Reconstruction Technique (SART). With the hardware acceleration, the computationally complex SART can run at speed comparable to the commonly used Filtered Back-Projection, and provide even better quality volume with less samples. The main contributions include two novel techniques to accelerate the reconstruction.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/CGIV.2009.18</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/993_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/993_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 Sixth International Conference on Computer Graphics, Imaging and Visualization</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>08/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yuqiang Lu</Author>
           <Author email="">Weiming Wang</Author>
           <Author email="">Shifu Chen</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CGIV.2009.18">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yuqiang Lu,Weiming Wang,Shifu Chen</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9eb3b453-b0e1-42bc-8040-626f61e09879</GUID>
        <Name>Profiling General Purpose GPU Applications</Name>
        <ShortDescription>We are witnessing an increasing adoption of GPUs for performing general purpose computation, which is usually known as GPGPU. The main challenge in developing such applications is that they often do not fit in the model required by the graphics processing devices, limiting the scope of applications that may be benefit from the computing power provided by GPUs. Even when the application fits GPU model, obtaining optimal resource usage is a complex task.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/SBAC-PAD.2009.26</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/992_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/992_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 21st International Symposium on Computer Architecture and High Performance Computing</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>10/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Bruno Rocha Coutinho</Author>
           <Author email="">George Luiz Medeiros Teodoro</Author>
           <Author email="">Rafael Sachetto Oliveira </Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/SBAC-PAD.2009.26">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bruno Rocha Coutinho,George Luiz Medeiros Teodoro,Rafael Sachetto Oliveira </Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f186e9e8-e3bd-41da-a917-0868ecbc7fdc</GUID>
        <Name>Improving Performance of Matrix Multiplication and FFT on GPU</Name>
        <ShortDescription>In this paper we discuss about our experiences in improving the performance of two key algorithms: the single-precision matrix-matrix multiplication subprogram (SGEMM of BLAS) and single-precision FFT using CUDA. The former is computation-intensive, while the latter is memory bandwidth or communication-intensive. A peak performance of 393 Gflops is achieved on NVIDIA GeForce GTX280 for the former, about 5% faster than the CUBLAS 2.0 library. Better FFT performance results are obtained for a range of dimensions. Some common principles are discussed for the design and implementation of many-core algorithms.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/991_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/991_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 15th International Conference on Parallel and Distributed Systems</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>12/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Xiang Cui</Author>
           <Author email="">Yifeng Chen</Author>
           <Author email="">Hong Mei</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICPADS.2009.8">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Xiang Cui,Yifeng Chen,Hong Mei</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>dedb0ea8-da35-401f-a28a-50bf45cb4f96</GUID>
        <Name>Coprocessor Computing with FPGA and GPU</Name>
        <ShortDescription>Specialized secondary processing units, such as field programmable gate arrays (FPGAs) and graphics processing units (GPUs), attempt to tackle the time consuming applications containing high computational requirements. In order to achieve acceleration, FPGAs allow a customizable architecture and Nvidia GPUs offer up to 16 cores with 128 stream processors.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/DoD.HPCMP.UGC.2008.69</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/990_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/990_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2008 DoD HPCMP Users Group Conference</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>07/17/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Song Jun Park</Author>
           <Author email="">Dale R. Shires</Author>
           <Author email="">Brian J. Henz </Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/DoD.HPCMP.UGC.2008.69">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Song Jun Park,Dale R. Shires,Brian J. Henz </Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a594c3dc-754e-4f17-a212-b1968a962069</GUID>
        <Name>GPU as a General Purpose Computing Resource</Name>
        <ShortDescription>In the last few years, GPUs(Graphics Processing Units) have made rapid development. Their ever-increasing computing power and decreasing cost have attracted attention from both industry and academia. In addition to graphics applications, researchers are interested in using them for general purpose computing. Recently, NVIDIA released a new computing architecture, CUDA (Compute Uni&#64257;ed Device Architecture), for its GeForce 8 series, Quadro FX, and Tesla GPU products.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/PDCAT.2008.38</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/989_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/989_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>04</ReleaseDay>
        <ReleaseDateDisplay>12/04/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Qihang Huang</Author>
           <Author email="">Zhiyi Huang</Author>
           <Author email="">Paul Werstein</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/PDCAT.2008.38">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Qihang Huang,Zhiyi Huang,Paul Werstein</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ce07c8c1-bb41-4cc2-b04f-ee02a2980f68</GUID>
        <Name>Accelerating Partitional Algorithms for Flow Cytometry on GPUs</Name>
        <ShortDescription>Like many modern techniques for scientific analysis, flow cytometry produces massive amounts of data that must be analyzed and clustered intelligently to be useful. Current manual binning techniques are cumbersome and limited in both the quality and quantity of analysis produced. To address the quality of results, a new framework applying two different sets of clustering algorithms and inference methods are implemented. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ISPA.2009.29</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/988_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/988_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 IEEE International Symposium on Parallel and Distributed Processing with Applications</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>08/12/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Jeremy Espenshade</Author>
           <Author email="">Andrew Pangborn</Author>
           <Author email="">Gregor von Laszewski</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ISPA.2009.29">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jeremy Espenshade,Andrew Pangborn,Gregor von Laszewski</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>34d48171-d36d-4bd4-8cf5-25e71f00c0ee</GUID>
        <Name>kD-Tree Traversal Implementations for Ray Tracing on Massive Multiprocessors: A Comparative Study</Name>
        <ShortDescription>Current GPU computational power enables the execution of complex and parallel algorithms, such as Ray Tracing techniques supported by kD-trees for 3D scene rendering in real time. This work describes in detail the study and implementation of five different kD-Tree traversal algorithms using the parallel framework NVIDIA Compute Unified Device Architecture (CUDA), in order to point their pros and cons regarding adaptation capability to the chosen architecture.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/SBAC-PAD.2009.25</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/987_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/987_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 21st International Symposium on Computer Architecture and High Performance Computing</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>10/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Artur L. dos Santos</Author>
           <Author email="">Joao Marcelo X.N. Teixeira</Author>
           <Author email="">Thiago S.M.C. de Farias</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/SBAC-PAD.2009.25">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Artur L. dos Santos,Joao Marcelo X.N. Teixeira,Thiago S.M.C. de Farias</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ed07648a-45fa-4ca3-b053-e20c0411184a</GUID>
        <Name>Multi-core acceleration of chemical kinetics for simulation and prediction</Name>
        <ShortDescription>This work implements a computationally expensive chemical kinetics kernel from a large-scale community atmospheric model on three multi-core platforms: NVIDIA GPUs using CUDA, the Cell Broadband Engine, and Intel Quad-Core Xeon CPUs. A comparative performance analysis for each platform in double and single precision on coarse and fine grids is presented. Platform-specific design and optimization is discussed in a mechanism-agnostic way, permitting the optimization of many chemical mechanisms. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1145/1654059.1654067</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/986_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/986_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Virginia Polytechnic Institute and State University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>11/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">John C. Linford</Author>
           <Author email="">John Michalakes</Author>
           <Author email="">Manish Vachharajani</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1145/1654059.1654067">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>John C. Linford,John Michalakes,Manish Vachharajani</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7d14d54f-5d86-4c0d-aed3-d3c48af7bcd6</GUID>
        <Name>Using Graphics Processors for High-Performance Computation and Visualization of Plasma Turbulence</Name>
        <ShortDescription>Direct numerical simulation (DNS) of turbulence is computationally intensive and typically relies on some form of parallel processing. Spectral kernels used for spatial discretization are a common computational bottleneck on distributed memory architectures. One way to increase DNS algorithms' efficiency is to parallelize spectral kernels using tightly coupled single-program, multiple-data (SPMD) multiprocessor units with minimal interprocessor communication latency. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.42</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/985_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/985_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Maryland</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>04/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">George Stantchev</Author>
           <Author email="">Derek Juba</Author>
           <Author email="">William Dorland</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.42">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>George Stantchev,Derek Juba,William Dorland</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c8e73d49-b21f-4fdb-9e45-387be9600fe0</GUID>
        <Name>Accelerating Phase Correlation Functions Using GPU and FPGA</Name>
        <ShortDescription>In this paper, we present a comparison study about implementations of phase correlation function using GPUs, ASIC and FPGAs. The Phase Only Correlation(POC) method demonstrates high robustness and subpixel accuracy in the pattern matching and the image registration. However, there is a disadvantage in computational speed because of the calculation of 2D-FFT etc.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/AHS.2009.53</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/984_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/984_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 NASA/ESA Conference on Adaptive Hardware and Systems</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>08/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Kentaro Matsuo</Author>
           <Author email="">Tsuyoshi Hamada</Author>
           <Author email="">Masayuki Miyoshi</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/AHS.2009.53">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Kentaro Matsuo,Tsuyoshi Hamada,Masayuki Miyoshi</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9d8a949b-df07-474b-987c-0f649a0c3750</GUID>
        <Name>Financial Derivatives Modeling Using GPU's</Name>
        <ShortDescription>The architecture of the latest Graphic Processing Unit (GPU) has surpassed the previous application-specific stream architecture. This has led to an architecture consisting of a number of uniform programmable units integrated on the same chip which facilitate the general-purpose computing beyond the graphic processing.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/EmbeddedCom-ScalCom.2009.85</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/983_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/983_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 International Conference on Scalable Computing and Communications</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>27</ReleaseDay>
        <ReleaseDateDisplay>09/27/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Myungho Lee</Author>
           <Author email="">Chin Hong Chun</Author>
           <Author email="">Sugwon Hong</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/EmbeddedCom-ScalCom.2009.85">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Myungho Lee,Chin Hong Chun,Sugwon Hong</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a9fb7f0f-051b-4f80-940c-0d35b077453f</GUID>
        <Name>Fast k nearest neighbor search using GPU</Name>
        <ShortDescription>Statistical measures coming from information theory represent interesting bases for image and video processing tasks such as image retrieval and video object tracking. For example, let us mention the entropy and the Kullback-Leibler divergence. Accurate estimation of these measures requires to adapt to the local sample density, especially if the data are high-dimensional.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/CVPRW.2008.4563100</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/982_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/982_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Universitu de Nice-Sophia Antipolis/CNRS Laboratoire I3S, France</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>06/28/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Vincent Garcia</Author>
           <Author email="">Eric Debreuve</Author>
           <Author email="">Michel Barlaud</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CVPRW.2008.4563100">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Vincent Garcia,Eric Debreuve,Michel Barlaud</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3b04ed36-f9f7-4cdd-bae3-6f168d7a28f4</GUID>
        <Name>Accelerating Simulations of Light Scattering Based on Finite-Difference Time-Domain Method with General Purpose GPUs</Name>
        <ShortDescription>Simulations of light scattering from nano-structured surface areas require substantial amount of computing time. The emergence of General Purpose Graphics Processing Units (GPGPUs) as affordable PC SIMD arithmetic coprocessors brings the necessary computing power to modern desktop PCs. In this paper we examine how the computation time of the Finite-Difference Time-Domain (FDTD), a classic numerical method for computing a solution to Maxwell's equations, can be reduced by leveraging the massively parallel architecture of GPGPUs cards.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/981_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/981_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2008 11th IEEE International Conference on Computational Science and Engineering</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>07/28/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">A. Balevic</Author>
           <Author email="">L. Rockstroh</Author>
           <Author email="">A. Tausendfreund</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CSE.2008.16">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>A. Balevic,L. Rockstroh,A. Tausendfreund</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b249b69b-fbc0-48a9-b9a1-6dfc8e766fee</GUID>
        <Name>Exploring the multiple-GPU design space</Name>
        <ShortDescription>Graphics Processing Units (GPUs) have been growing in popularity due to their impressive processing capabilities, and with general purpose programming languages such as NVIDIA's CUDA interface, are becoming the platform of choice in the scientific computing community. Previous studies that used GPUs focused on obtaining significant performance gains from execution on a single GPU. These studies employed low-level, architecture-specific tuning in order to achieve sizeable benefits over multicore CPU execution. In this paper, we consider the benefits of running on multiple (parallel) GPUs to provide further orders of performance speedup.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5161068</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/980_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/980_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Northeastern University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>05/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Dana Schaa</Author>
           <Author email="">David Kaeli</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5161068">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Dana Schaa,David Kaeli</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>558ecded-dc18-475f-9e3d-ebda86332f8f</GUID>
        <Name>The Virtual Marathon: Parallel Computing Supports Crowd Simulations</Name>
        <ShortDescription>To be realistic, an urban model must include appropriate numbers of pedestrians, vehicles, and other dynamic entities. Using a parallelcomputing architecture, researchers simulated a marathon with more than a million participants. To simulate participant behavior, they used fuzzy logic on a GPU to perform millions of inferences in real time.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/979_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/979_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>IEEE Computer Graphics</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>08/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Erdal Yilmaz</Author>
           <Author email="">Veysi Isler</Author>
           <Author email="">Yasemin Yardimci Cetin</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/MCG.2009.77">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Erdal Yilmaz,Veysi Isler,Yasemin Yardimci Cetin</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e09be7a1-fad3-47c5-8189-cb111c0818df</GUID>
        <Name>A Parallel Gibbs Sampling Algorithm for Motif Finding on GPU</Name>
        <ShortDescription>Motif is overrepresented pattern in biological sequence and Motif finding is an important problem in bioinformatics. Due to high computational complexity of motif finding, more and more computational capabilities are required as the rapid growth of available biological data, such as gene transcription data. Among many motif finding algorithms, Gibbs sampling is an effective method for long motif finding. In this paper we present an improved Gibbs sampling method on graphics processing units (GPU) to accelerate motif finding. Experimental data support that, compared to traditional programs on CPU, our program running on GPU provides an effective and low-cost solution for motif finding problem, especially for long motif finding.
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/978_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/978_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 IEEE International Symposium on Parallel and Distributed Processing with Applications</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>08/12/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Linbin Yu</Author>
           <Author email="">Yun Xu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ISPA.2009.88">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Linbin Yu,Yun Xu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c1239952-bc70-4a62-b8aa-da685c20d2ea</GUID>
        <Name>Cellular Level Agent Based Modelling on the Graphics Processing Unit</Name>
        <ShortDescription>Cellular level agent based modelling is reliant on either sequential processing environments or expensive and largely unavailable PC grids. The GPU offers an alternative architecture for such systems, however the steep learning curve associated with the GPUs data parallel architecture has previously limited the uptake of this emerging technology.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/HiBi.2009.12</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/977_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/977_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 International Workshop on High Performance Computational Systems Biology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>10/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Paul Richmond</Author>
           <Author email="">Simon Coakley</Author>
           <Author email="">Daniela Romano</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/HiBi.2009.12">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Paul Richmond,Simon Coakley,Daniela Romano</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>843ce581-fa99-4372-99d8-6ef7b20ec10e</GUID>
        <Name>A microdriver architecture for error correcting codes inside the Linux kernel</Name>
        <ShortDescription>Coding tasks, such as encryption of data or the generation of failure-tolerant codes, belong to the most computationaly expensive tasks inside the Linux kernel. Their integration into the kernel enables the user to transparently access these functionalities, encrypted hard disks can be used in the same way as unencrypted ones. Nevertheless, Linux as a monolithic kernel is not prepared to support these expensive tasks by accessing modern hardware accelerators, like graphics processing units (GPUs), as the corresponding accelerator libraries, like the CUDA-API for NVIDIA GPUs, only offer user-space APIs. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1145/1654059.1654095</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/976_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/976_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Paderborn, Germany</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>11/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">A. Brinkmann</Author>
           <Author email="">D. Eschweiler</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1145/1654059.1654095">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>A. Brinkmann,D. Eschweiler</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d6ce8db6-de74-452f-8737-b2efa14c3d63</GUID>
        <Name>A Program Behavior Study of Block Cryptography Algorithms on GPGPU</Name>
        <ShortDescription>Recently many studies have been made to map cryptography algorithms onto graphics processors (GPU), and gained great performances. This paper does not focus on the performance of a specific program exploited by using all kinds of optimization methods algorithmically, but the intrinsic reason which lies in GPU architectural features for this performance improvement. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/FCST.2009.13</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/975_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/975_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 Fourth International Conference on Frontier of Computer Science and Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>12/19/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Gu Liu</Author>
           <Author email="">Hong An</Author>
           <Author email="">Wenting Han</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/FCST.2009.13">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Gu Liu,Hong An,Wenting Han</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d9dd760c-7469-42a8-905c-7144ff3d043d</GUID>
        <Name>Count Sort for GPU Computing</Name>
        <ShortDescription>Counting sort is a simple, stable and efficient sort algorithm with linear running time, which is a fundamental building block for many applications. This paper depicts the design issues of a data parallel implementation of the count sort algorithm on a commodity multiprocessor GPU using the Compute Unified Device Architecture (CUDA) platform, both from NVIDIA Corporation.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ICPADS.2009.30</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/974_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/974_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 15th International Conference on Parallel and Distributed Systems</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>12/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Weidong Sun</Author>
           <Author email="">Zongmin Ma</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICPADS.2009.30">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Weidong Sun,Zongmin Ma</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3f7ee5e2-88e9-43da-9bc5-1e269f9fddc9</GUID>
        <Name>GPU-accelerated, gradient-free MI deformable registration for atlas-based MR brain image segmentation</Name>
        <ShortDescription>Brain structure segmentation is an important task in many neuroscience and clinical applications. In this paper, we introduce a novel MI-based dense deformable registration method and apply it to the automatic segmentation of detailed brain structures. Together with a multiple atlas fusion strategy, very accurate segmentation results were obtained, as compared with other reported methods in the literature.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/CVPR.2009.5204043</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/973_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/973_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Maryland Heights, MO</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>06/25/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Xiao Han</Author>
           <Author email="">L.S. Hibbard</Author>
           <Author email="">V. Willcut</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CVPR.2009.5204043">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Xiao Han,L.S. Hibbard,V. Willcut</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>47e489fe-8243-425b-a88c-27a5d18b0f6a</GUID>
        <Name>Solving Computational Problems with GPU Computing</Name>
        <ShortDescription>Modern GPUs are massively parallel microprocessors that can deliver very high performance for the parallel computations common in science and engineering.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/972_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/972_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Computing Science</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>10/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Jonathan Cohen</Author>
           <Author email="">Michael Garland</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.144">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jonathan Cohen,Michael Garland</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8a1ca4bd-1895-40b6-98b7-33bdddca994d</GUID>
        <Name>The Synchronization Power of Coalesced Memory Accesses</Name>
        <ShortDescription>Multicore architectures have established themselves as the new generation of computer architectures. As part of the one core to many cores evolution, memory access mechanisms have advanced rapidly. Several new memory access mechanisms have been implemented in many modern commodity multicore architectures. By specifying how processing cores access shared memory, memory access mechanisms directly influence the synchronization capabilities of multicore architectures. Therefore, it is crucial to investigate the synchronization power of these new memory access mechanisms. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/971_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/971_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Chalmers University of Technology, Gothenburg</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>12/31/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Phuong Hoai Ha</Author>
           <Author email="">Philippas Tsigas</Author>
           <Author email="">Otto J. Anshus</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/TPDS.2009.134">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Phuong Hoai Ha,Philippas Tsigas,Otto J. Anshus</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fbaceb8f-3e46-4070-ac07-fe2e8d5e4608</GUID>
        <Name>Fast Disk Encryption through GPGPU Acceleration</Name>
        <ShortDescription>We present the design and performance analysis of a GPU-optimized implementation of a disk encryption application employing the XTS mode of operation applied together with the Twofish algorithm within the well-known TrueCrypt suite. We show how to correctly tune the design parameters, including data allocation, thread packing, and parallelization strategy. Overall, our implementation of TrueCrypt running on a NVidia GTX260 GPU outperforms by 67% the baseline implementation running on a four core CPU.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/970_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/970_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 International Conference on Parallel and Distributed Computing, Applications and Technologies</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>12/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Giovanni Agosta</Author>
           <Author email="">Alessandro Barenghi</Author>
           <Author email="">Fabrizio De Santis </Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/PDCAT.2009.72">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Giovanni Agosta,Alessandro Barenghi,Fabrizio De Santis </Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4471c01f-7076-4798-acd4-af519ff3ae9e</GUID>
        <Name>Optical Flow Computation on Compute Unified Device Architecture</Name>
        <ShortDescription>In this study, the implementation of an image processing technique on Compute Unified Device Architecture (CUDA) is discussed. CUDA is a new hardware and software architecture developed by NVIDIA Corporation for the generalpurpose computation on graphics processing units.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ICIAP.2007.97</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/969_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/969_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Yamaguchi University, Japan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>09/14/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yoshiki Mizukami</Author>
           <Author email="">Katsumi Tadamura</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICIAP.2007.97">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yoshiki Mizukami,Katsumi Tadamura</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e3d51228-c2ed-41f4-bdfe-11daa5563861</GUID>
        <Name>Linear optimization on modern GPUs</Name>
        <ShortDescription>Optimization algorithms are becoming increasingly more important in many areas, such as finance and engineering. Typically, real problems involve several hundreds of variables, and are subject to as many constraints. Several methods have been developed trying to reduce the theoretical time complexity. Nevertheless, when problems exceed reasonable sizes they end up being very computationally intensive.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5161106</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/968_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/968_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Science and Technology, Norway</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>05/05/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Daniele G. Spampinato</Author>
           <Author email="">Anne C. Elster</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5161106">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Daniele G. Spampinato,Anne C. Elster</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8252cbfd-7350-4004-acfb-096dfea1d9e2</GUID>
        <Name>Mapping High-Fidelity Volume Rendering for Medical Imaging to CPU, GPU and Many-Core Architectures</Name>
        <ShortDescription>Medical volumetric imaging requires high fidelity, high performance rendering algorithms. We motivate and analyze new volumetric rendering algorithms that are suited to modern parallel processing architectures. First, we describe the three major categories of volume rendering algorithms and confirm through an imaging scientist-guided evaluation that ray-casting is the most acceptable.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/TVCG.2009.164</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/967_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/967_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Intel Corporation</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>11/15/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Mikhail Smelyanskiy</Author>
           <Author email="">David Holmes</Author>
           <Author email="">Jatin Chhugani</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/TVCG.2009.164">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mikhail Smelyanskiy,David Holmes,Jatin Chhugani</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>bf5ce6bc-a2c0-4262-961b-d0fe4504edc1</GUID>
        <Name>GPU-accelerated, gradient-free MI deformable registration for atlas-based MR brain image segmentation</Name>
        <ShortDescription>Brain structure segmentation is an important task in many neuroscience and clinical applications. In this paper, we introduce a novel MI-based dense deformable registration method and apply it to the automatic segmentation of detailed brain structures.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/CVPR.2009.5204043</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/966_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/966_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Maryland Heights</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>06/25/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Xiao Han</Author>
           <Author email="">L.S. Hibbard</Author>
           <Author email="">V. Willcut </Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CVPR.2009.5204043">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Xiao Han,L.S. Hibbard,V. Willcut </Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>97e427af-035a-48b2-944a-44029f2b874e</GUID>
        <Name>Efficient band approximation of Gram matrices for large scale kernel methods on GPUs</Name>
        <ShortDescription>Kernel-based methods require O(N2) time and space complexities to compute and store non-sparse Gram matrices, which is prohibitively expensive for large scale problems. We introduce a novel method to approximate a Gram matrix with a band matrix. Our method relies on the locality preserving properties of space filling curves, and the special structure of Gram matrices. Our approach has several important merits.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1145/1654059.1654091</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/965_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/965_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis </OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>11/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Mohamed Hussein</Author>
           <Author email="">Wael Abd-Almageed</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1145/1654059.1654091">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mohamed Hussein,Wael Abd-Almageed</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0286b4ba-3c62-4513-b654-eb17ca5eb44f</GUID>
        <Name>CUDA-Based Jacobi's Iterative Method</Name>
        <ShortDescription>Solving linear equations is a common problem in the fields of science and engineering. Accelerating its solving process is of great significance. Modern GPUs are high performance many-core processors fit for large scale parallel computing. They provide us a novel way for accelerating the solving process. A GPU based parallel Jacobis iterative solver for dense linear equations is presented in this paper.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/IFCSTA.2009.68</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/964_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/964_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 International Forum on Computer Science-Technology and Applications</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>27</ReleaseDay>
        <ReleaseDateDisplay>12/27/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Zhihui Zhang</Author>
           <Author email="">Qinghai Miao</Author>
           <Author email="">Ying Wang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/IFCSTA.2009.68">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Zhihui Zhang,Qinghai Miao,Ying Wang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>aa5f9c87-b466-42cd-bbbf-e6d69a883462</GUID>
        <Name>Voice Command Recognition with Dynamic Time Warping (DTW) using GPU with CUDA</Name>
        <ShortDescription>Recently, we are attending to a huge evolution on the development of high performance computing platforms. Among these platforms, the GPU (Graphics Processing Units) stimulated by game industries, constantly demanding more graphical processing power, evolved from a simple graphical card to a general purpose computation parallel data processing device.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/SBAC-PAD.2007.21</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/963_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/963_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>19th International Symposium on Computer Architecture and High Performance Computing </OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>27</ReleaseDay>
        <ReleaseDateDisplay>10/27/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Gustavo Poli</Author>
           <Author email="">Joso F. Mari</Author>
           <Author email="">Josw Hiroki Saito </Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/SBAC-PAD.2007.21">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Gustavo Poli,Joso F. Mari,Josw Hiroki Saito </Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>611c95a7-12d2-4a62-9ae9-3416cd2b2bc7</GUID>
        <Name>Speeding up Mutual Information Computation Using NVIDIA CUDA Hardware</Name>
        <ShortDescription>We present an efficient method for mutual information (MI) computation between images (2D or 3D) for NVIDIA's 'compute unified device architecture' (CUDA) compatible devices. Efficient parallelization of MI is particularly challenging on a 'graphics processor unit' (GPU) due to the need for histogram-based calculation of joint and marginal probability mass functions (pmfs) with large number of bins.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/DICTA.2007.177</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/962_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/962_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>09/05/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Ramtin Shams</Author>
           <Author email="">Nick Barnes</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/DICTA.2007.177">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ramtin Shams,Nick Barnes</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>6f659aa3-3402-474c-860e-06af9f94e3f8</GUID>
        <Name>NVIDIA Tesla: A Unified Graphics and Computing Architecture</Name>
        <ShortDescription>To enable flexible, programmable graphics and high-performance computing, NVIDIA has developed the Tesla scalable unified graphics and parallel computing architecture. Its scalable parallel array of processors is massively multithreaded and programmable in C or via graphics APIs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/961_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/961_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>NVIDIA Corp.</OrganizationName>
        <OrganizationURL>http://www.nvidia.com/cuda</OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>04/01/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Erik Lindholm</Author>
           <Author email="">John Nickolls</Author>
           <Author email="">Stuart Oberman</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/MM.2008.31">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Erik Lindholm,John Nickolls,Stuart Oberman</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>20130b8f-bc91-4d6c-890e-3a84b88cc98b</GUID>
        <Name>Fast Deformable Registration on the GPU: A CUDA Implementation of Demons</Name>
        <ShortDescription>In the medical imaging field, we need fast deformable registration methods especially in intra-operative settings characterized by their time-critical applications. Image registration studies which are based on Graphics Processing Units (GPUs) provide fast implementations. However, only a small number of these GPU-based studies concentrate on deformable registration.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ICCSA.2008.22</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/960_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/960_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2008 International Conference on Computational Sciences and Its Applications</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>07/03/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Pinar Muyan-Ozcelik</Author>
           <Author email="">John D. Owens</Author>
           <Author email="">Junyi Xia</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICCSA.2008.22">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Pinar Muyan-Ozcelik,John D. Owens,Junyi Xia</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4fb16757-9777-49a6-a716-41a02e3229fa</GUID>
        <Name>Efficient visual hull computation for real-time 3D reconstruction using CUDA</Name>
        <ShortDescription>In this paper we present two efficient GPU-based visual hull computation algorithms. We compare them in terms of performance using image sets of varying size and different voxel resolutions. In addition, we present a real-time 3D reconstruction system which uses the proposed GPU-based reconstruction method to achieve real-time performance (30 fps) using 16 cameras and 4 PCs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/959_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/959_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>06/28/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Alexander Ladikos</Author>
           <Author email="">Selim Benhimane</Author>
           <Author email="">Nassir Navab</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CVPRW.2008.4563098">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alexander Ladikos,Selim Benhimane,Nassir Navab</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c1a69133-fcfc-4190-83fc-498481f6a97f</GUID>
        <Name>CUDA cuts: Fast graph cuts on the GPU</Name>
        <ShortDescription>Graph cuts has become a powerful and popular optimization tool for energies defined over an MRF and have found applications in image segmentation, stereo vision, image restoration, etc. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/CVPRW.2008.4563095</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/958_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/958_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>06/28/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Vibhav Vineet</Author>
           <Author email="">P. J. Narayanan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CVPRW.2008.4563095">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Vibhav Vineet,P. J. Narayanan</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>dcbe76e1-5927-42de-b543-0d1e2fca0a57</GUID>
        <Name>GPU Acceleration of 2D-DWT Image Compression in MATLAB with CUDA</Name>
        <ShortDescription>This article will present the details about the acceleration of 2D wavelet-based medical data (image) compression on MATLAB with CUDA. It is obvious that the diagnostic materials (mostly as acertain type of image) are increasingly acquired in a digital format. Therefore, common need to daily manipulate huge amount of data brought about the issue of compression within a very less stipulated amount of time.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/EMS.2008.43</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/957_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/957_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2008 Second UKSIM European Symposium on Computer Modeling and Simulation</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>09/10/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Vaclav Simek</Author>
           <Author email="">Ram Rakesh Asn</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/EMS.2008.43">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Vaclav Simek,Ram Rakesh Asn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>dee4e626-3b28-4f97-b21f-51e7af3cd36a</GUID>
        <Name>Parallel Computing Experiences with CUDA</Name>
        <ShortDescription>The CUDA programming model provides a straightforward means of describing inherently parallel computations, and NVIDIA's Tesla GPU architecture delivers high computational throughput on massively parallel problems. This article surveys experiences gained in applying CUDA to a diverse set of problems and the parallel speedups over sequential codes running on traditional CPU architectures attained by executing key computations on the GPU.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/956_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/956_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>NVIDIA Corp.</OrganizationName>
        <OrganizationURL>http://www.nvidia.com/cuda</OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>08/01/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Michael Garland</Author>
           <Author email="">Scott Le Grand</Author>
           <Author email="">John Nickolls</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/MM.2008.57">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Michael Garland,Scott Le Grand,John Nickolls</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f281cdb4-83a4-4ec4-afd5-099f220b7e08</GUID>
        <Name>Processing Neocognitron of Face Recognition on High Performance Environment Based on GPU with CUDA Architecture</Name>
        <ShortDescription>This work presents an implementation of Neocognitron Neural Network, using a high performance computing architecture based on GPU (Graphics Processing Unit). Neocognitron is an artificial neural network, proposed by Fukushima and collaborators, constituted of several hierarchical stages of neuron layers, organized in two-dimensional matrices called cellular planes.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/SBAC-PAD.2008.25</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/955_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/955_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2008 20th International Symposium on Computer Architecture and High Performance Computing</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>11/01/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Gustavo Poli</Author>
           <Author email="">Jose Hiroki Saito </Author>
           <Author email="">Joso F. Mari </Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/SBAC-PAD.2008.25">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Gustavo Poli,Jose Hiroki Saito ,Joso F. Mari </Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>33b8d91d-369c-4bcf-a02d-dea9fc848e19</GUID>
        <Name>Low-cost, high-speed computer vision using NVIDIA's CUDA architecture</Name>
        <ShortDescription>In this paper, we introduce real time image processing techniques using modern programmable Graphic Processing Units (GPU). GPUs are SIMD (Single Instruction, Multiple Data) device that is inherently data-parallel.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/AIPR.2008.4906458</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/954_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/954_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Virginia Polytechnic Institute and University Blacksburg</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>10/17/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Seung In Park</Author>
           <Author email="">Sean P. Ponce</Author>
           <Author email="">Jing Huang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/AIPR.2008.4906458">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Seung In Park,Sean P. Ponce,Jing Huang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>49ac66d4-2dd1-44ae-9d83-71016916b1ee</GUID>
        <Name>A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality</Name>
        <ShortDescription>We present a development environment for distributed GPU computing targeted for multi-GPU systems, as well as graphics clusters. Our system is based on CUDA and logically extends its parallel programming model for graphics processors to higher levels of parallelism, namely, the PCI bus and network interconnects.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/TVCG.2008.188</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/953_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/953_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Visualisierungsinstitut der Universitat Stuttgart</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>08/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Christoph Muller</Author>
           <Author email="">Steffen Frey</Author>
           <Author email="">Magnus Strengert</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/TVCG.2008.188">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Christoph Muller,Steffen Frey,Magnus Strengert</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d93c7dc3-6f07-4ce2-b097-1dc5245273c0</GUID>
        <Name>Parallel Image Processing Based on CUDA</Name>
        <ShortDescription>CUDA (Compute Unified Device Architecture) is a novel technology of general-purpose computing on the GPU, which makes users develop general GPU (Graphics Processing Unit) programs easily. This paper analyzes the distinct features of CUDA GPU, summarizes the general program mode of CUDA.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/CSSE.2008.1448</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/952_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/952_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2008 International Conference on Computer Science and Software Engineering</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>12/15/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Zhiyi Yang</Author>
           <Author email="">Yating Zhu</Author>
           <Author email="">Yong Pu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CSSE.2008.1448">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Zhiyi Yang,Yating Zhu,Yong Pu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>54a140b4-1765-4eb8-82e7-08162de212f0</GUID>
        <Name>Neural Network Implementation Using CUDA and OpenMP</Name>
        <ShortDescription>Many algorithms for image processing and pattern recognition have recently been implemented on GPU (graphic processing unit) for faster computational times. However, the implementation using GPU encounters two problems. First, the programmer should master the fundamentals of the graphics shading languages that require the prior knowledge on computer graphics.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/DICTA.2008.82</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/951_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/951_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2008 Digital Image Computing: Techniques and Applications</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>12/03/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Honghoon Jang</Author>
           <Author email="">Anjin Park</Author>
           <Author email="">Keechul Jung</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/DICTA.2008.82">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Honghoon Jang,Anjin Park,Keechul Jung</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>06485c3c-860d-46c7-b2af-b1e0fc9dd903</GUID>
        <Name>A Parallel Implementation of the 2D Wavelet Transform Using CUDA</Name>
        <ShortDescription>There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing devices. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/PDP.2009.40</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/949_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/949_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 Parallel, Distributed and Network-based Processing</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>02/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Joaquin Franco</Author>
           <Author email="">Gregorio Bernabe</Author>
           <Author email="">Juan Fernandez</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/PDP.2009.40">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Joaquin Franco,Gregorio Bernabe,Juan Fernandez</Keyword> 
        </Keywords>
     </Application>


     <Application>
        <GUID>03ba8152-8414-4f72-a3da-fc67b85f62a8</GUID>
        <Name>Towards Accelerated Computation of Atmospheric Equations Using CUDA</Name>
        <ShortDescription>Main objective of this paper is to outline possibleways how to achieve a substantial acceleration in caseof advection-diffusion equation (A-DE) calculation,which is commonly used for a description of thepollutant behavior in atmosphere.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/UKSIM.2009.25</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/948_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/948_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>UKSim 2009: 11th International Conference on Computer Modelling and Simulation</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>03/25/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Vaclav Simek</Author>
           <Author email="">Radim Dvorak</Author>
           <Author email="">Frantisek Zboril</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/UKSIM.2009.25">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Vaclav Simek,Radim Dvorak,Frantisek Zboril</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>27bfd115-787f-47af-bc51-e3978fe90dc2</GUID>
        <Name>K-Means on Commodity GPUs with CUDA</Name>
        <ShortDescription>K-means algorithm is one of the most famous unsupervised clustering algorithms. Many theoretical improvements for the performance of original algorithms have been put forward, while almost all of them are based on Single Instruction Single Data(SISD) architecture processors (CPUs), which partly ignored the inherent paralleled characteristic of the algorithms.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/CSIE.2009.491</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/947_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/947_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 WRI World Congress on Computer Science and Information Engineering</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>04/02/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Bai Hong-tao</Author>
           <Author email="">He Li-li</Author>
           <Author email="">Ouyang Dan-tong</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CSIE.2009.491">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bai Hong-tao,He Li-li,Ouyang Dan-tong</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8d3833b2-a532-4cd2-8ab3-5d8d6d572f46</GUID>
        <Name>Accelerating K-Means on the Graphics Processor via CUDA</Name>
        <ShortDescription>In this paper an optimized k-means implementation on the graphics processing unit (GPU) is presented. NVIDIAs Compute Unified Device Architecture (CUDA), available from the G80 GPU family onwards, is used as the programming environment.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/INTENSIVE.2009.19</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/946_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/946_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 First International Conference on Intensive Applications and Services</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>04/25/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Mario Zechner</Author>
           <Author email="">Michael Granitzer</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/INTENSIVE.2009.19">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mario Zechner,Michael Granitzer</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e458a4f1-5693-4f34-beb8-9f46e2d0e158</GUID>
        <Name>Hierarchical Agglomerative Clustering Using Graphics Processor with Compute Unified Device Architecture</Name>
        <ShortDescription>We explore the use of todays high-end Graphics processing units on desktops to perform hierarchical agglomerative clustering with the Compute Unified Device Architecture CUDA of NVIDIA. Although the advancement in graphics cards has made the gaming industry to flourish,there is a lot more to be gained the field of scientific computing, high performance computing and their applications.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ICSPS.2009.167</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/945_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/945_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>2009 International Conference on Signal Processing Systems</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>05/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">S.A. Arul Shalom</Author>
           <Author email="">Manoranjan Dash</Author>
           <Author email="">Minh Tue</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICSPS.2009.167">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>S.A. Arul Shalom,Manoranjan Dash,Minh Tue</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>cf4cef10-6cad-4a80-94eb-fca17c2968c6</GUID>
        <Name>Compute Pairwise Manhattan Distance and Pearson Correlation Coefficient of Data Points with GPU</Name>
        <ShortDescription>Graphics processing units (GPUs) are powerful computational devices tailored towards the needs of the 3-D gaming industry for high-performance, real-time graphics engines. Nvidia Corporation released a new generation of GPUs designed for general-purpose computing in 2006, and it released a GPU programming language called CUDA in 2007.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/SNPD.2009.34</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/944_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/944_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Catholic University of Daegu, Korea </OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>05/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Dar-Jen Chang</Author>
           <Author email="">Ahmed H. Desoky</Author>
           <Author email="">Ming Ouyang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/SNPD.2009.34">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Dar-Jen Chang,Ahmed H. Desoky,Ming Ouyang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>55947cbf-ca00-49aa-818d-b17d817541c9</GUID>
        <Name>Designing efficient sorting algorithms for manycore GPUs</Name>
        <ShortDescription>We describe the design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by CUDA. Our radix sort is the fastest GPU sort and our merge sort is the fastest comparison-based sort reported in the literature.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5161005</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/943_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/943_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of California, Berkeley, USA</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>05/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Nadathur Satish</Author>
           <Author email="">Mark Harris</Author>
           <Author email="">Michael Garland</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5161005">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Nadathur Satish,Mark Harris,Michael Garland</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>bde241dd-c89e-46ca-a561-593e39e30290</GUID>
        <Name>Design of a parallel AES for graphics hardware using the CUDA framework</Name>
        <ShortDescription>Web servers often need to manage encrypted transfers of data. The encryption activity is computationally intensive, and exposes a significant degree of parallelism. At the same time, cheap multicore processors are readily available on graphics hardware, and toolchains for development of general purpose programs are being released by the vendors.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5161242</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/942_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/942_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Politecnico di Milano, Italy</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>10/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Andrea Di Biagio</Author>
           <Author email="">Alessandro Barenghi</Author>
           <Author email="">Giovanni Agosta</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5161242">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Andrea Di Biagio,Alessandro Barenghi,Giovanni Agosta</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>bbe0ed13-d4a2-48c7-b7ef-8ea7c50ff41e</GUID>
        <Name>vCUDA: GPU accelerated high performance computing in virtual machines</Name>
        <ShortDescription>This paper describes vCUDA, a GPGPU (General Purpose Graphics Processing Unit) computing solution for virtual machines. vCUDA allows applications executing within virtual machines (VMs) to leverage hardware acceleration, which can be beneficial to the performance of a class of high performance computing (HPC) applications.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5161020</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/941_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/941_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Hunan University, China</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>05/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Lin Shi</Author>
           <Author email="">Hao Chen</Author>
           <Author email="">Jianhua Sun</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5161020">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Lin Shi,Hao Chen,Jianhua Sun</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f61532c5-2f66-40d8-8f6c-04b50f5bbefd</GUID>
        <Name>Accelerating error correction in high-throughput short-read DNA sequencing data with CUDA</Name>
        <ShortDescription>Emerging DNA sequencing technologies open up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, produced reads are significantly shorter and more error-prone compared to the traditional Sanger shotgun sequencing method.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5160924</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/940_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/940_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nanyang Technological University, Singapore</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>05/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Haixiang Shi</Author>
           <Author email="">Bertil Schmidt</Author>
           <Author email="">Weiguo Liu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5160924">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Haixiang Shi,Bertil Schmidt,Weiguo Liu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>292b253a-57be-4f0b-9426-3d326854885c</GUID>
        <Name>Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors</Name>
        <ShortDescription>The availability of easily programmable manycore CPUs and GPUs has motivated investigations into how to best exploit their tremendous computational power for scientific computing. Here we demonstrate how a systems biology application detection and tracking of white blood cells in video microscopy can be accelerated by 200x using a CUDA-capable GPU. Because the algorithms and implementation challenges are common to a wide range of applications, we discuss general techniques that allow programmers to make efficient use of a manycore GPU.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/938_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/938_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Virginia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>05/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>200</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Michael Boyer</Author>
           <Author email="">David Tarjan</Author>
           <Author email="">Scott T. Acton</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5160984">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Michael Boyer,David Tarjan,Scott T. Acton</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>158694c3-4565-45ec-9051-9c1dfd7120d0</GUID>
        <Name>An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases</Name>
        <ShortDescription>The Smith Waterman algorithm for sequence alignment is one of the main tools of bioinformatics. It is used for sequence similarity searches and alignment of similar sequences. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5160931</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/937_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/937_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Warsaw, Poland</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>05/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Lukasz Ligowski</Author>
           <Author email="">Witold Rudnicki</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5160931">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Lukasz Ligowski,Witold Rudnicki</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e58ea7e5-f7df-481c-8b10-3b3eccd9e977</GUID>
        <Name>CuPP - A framework for easy CUDA integration</Name>
        <ShortDescription>This paper reports on CuPP, our newly developed C++ framework designed to ease integration of NVIDIAs GPGPU system CUDA into existing C++ applications. CuPP provides interfaces to reoccurring tasks that are easier to use than the standard CUDA interfaces. In this paper we concentrate on memory management and related data structures.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5160937</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/936_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/936_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universitat Kassel, Germany</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>05/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Jens Breitbart</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5160937">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jens Breitbart</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>16035543-5127-42c0-a33b-e3bebe2fcc62</GUID>
        <Name>Singular value decomposition on GPU using CUDA</Name>
        <ShortDescription>Linear algebra algorithms are fundamental to many computing applications. Modern GPUs are suited for many general purpose processing tasks and have emerged as inexpensive high performance co-processors due to their tremendous computing power. In this paper, we present the implementation of singular value decomposition (SVD) of a dense matrix on GPU using the CUDA programming model.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5161058</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/935_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/935_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>International Institute of Information Technology, India</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>05/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Sheetal Lahabar</Author>
           <Author email="">P. J. Narayanan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5161058">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sheetal Lahabar,P. J. Narayanan</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b854753b-3791-4a9c-9b9a-4fe6700b4aa1</GUID>
        <Name>Parallel reconstruction of neighbor-joining trees for large multiple sequence alignments using CUDA </Name>
        <ShortDescription>Computing large multiple protein sequence alignments using progressive alignment tools such as ClustalW requires several hours on state-of-the-art workstations. ClustalW uses a three-stage processing pipeline:</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5160923</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/934_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/934_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nanyang Technological University, Singapore</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>05/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yongchao Liu</Author>
           <Author email="">Bertil Schmidt</Author>
           <Author email="">Douglas L. Maskell</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/IPDPS.2009.5160923">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yongchao Liu,Bertil Schmidt,Douglas L. Maskell</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>cb7682ca-4aae-4703-ae29-e99695f85d91</GUID>
        <Name>Ocean3DTechnology</Name>
        <ShortDescription>Simulation oceanic surfaces; physics calculation for objects in water environment.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/933_ocean_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/933_ocean_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Ocean3DInteractive</OrganizationName>
        <OrganizationURL>http://www.ocean3dinteractive.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>04/15/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">Mykola Ozerchuk</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.youtube.com/watch?v=zzsCGdbSUgU">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Game Physics</ApplicationType>
           <ApplicationType>Graphics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mykola Ozerchuk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>23e01f03-877d-433d-8b31-74754d82b8d9</GUID>
        <Name>FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs</Name>
        <ShortDescription>As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore's law, the computing industry has switched its route to higher performance through parallel processing. The rise of multi-core systems in all domains of computing has opened the door to heterogeneous multi-processors, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs and FPGAs are becoming very popular in PC-based heterogeneous systems for speeding up compute intensive kernels of scientific, imaging and simulation applications.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/SASP.2009.5226333</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/932_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/932_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Illinois</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>27</ReleaseDay>
        <ReleaseDateDisplay>07/27/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Alexandros Papakonstantinou</Author>
           <Author email="">Karthik Gururaj</Author>
           <Author email="">John A. Stratton</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/SASP.2009.5226333">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alexandros Papakonstantinou,Karthik Gururaj,John A. Stratton</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5894f1ec-6a2f-4df5-ac17-a8a77ada7394</GUID>
        <Name>MSA-CUDA: Multiple Sequence Alignment on Graphics Processing Units with CUDA</Name>
        <ShortDescription>Progressive alignment is a widely used approach for computing multiple sequence alignments (MSAs). However, aligning several hundred or thousand sequences with popular progressive alignment tools such as ClustalW requires hours or even days on state-of-the-art workstations. This paper presents MSA-CUDA, a parallel MSA program, which parallelizes all three stages of the ClustalW processing pipeline using CUDA and achieves significant speedups compared to the sequential ClustalW for a variety of large protein sequence datasets. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ASAP.2009.14</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/931_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/931_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>07/07/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>36</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yongchao Liu</Author>
           <Author email="">Bertil Schmidt</Author>
           <Author email="">Douglas L. Maskell</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ASAP.2009.14">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yongchao Liu,Bertil Schmidt,Douglas L. Maskell</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0e4de9b3-9658-4d42-b197-6ef57ab2d2ee</GUID>
        <Name>Getting Started with GPU Programming</Name>
        <ShortDescription>This tutorial describes a step-by-step procedure for programming a Macintosh Nvidia GPU. General scientific programmers with some C knowledge can get started in parallel processing application development with relative ease.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/930_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/930_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>American University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>08/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Michael A. Gray</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.119">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Michael A. Gray</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9777acef-c2ae-4810-a2bb-809471ddc369</GUID>
        <Name>An Empirically Optimized Radix Sort for GPU</Name>
        <ShortDescription>Graphics Processing Units (GPUs) that support general purpose program are promising platforms for high performance computing. However, the fundamental architectural difference between GPU and CPU, the complexity of GPU platform and the diversity of GPU specifications have made the generation of highly efficient code for GPU increasingly difficult. Manual code generation is time consuming and the result tends to be difficult to debug and maintain.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ISPA.2009.89</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/929_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/929_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 IEEE International Symposium on Parallel and Distributed Processing with Applications</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>08/10/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Bonan Huang</Author>
           <Author email="">Jinlan Gao</Author>
           <Author email="">Xiaoming Li</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ISPA.2009.89">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bonan Huang,Jinlan Gao,Xiaoming Li</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8a13d819-67c1-47d5-994b-7a995ba156b8</GUID>
        <Name>Accelerating Genome-Wide Association Studies Using CUDA Compatible Graphics Processing Units</Name>
        <ShortDescription>Recent advances in highly parallel, multithreaded, manycore Graphics Processing Units (GPUs) have been enabling massive parallel implementations of many applications in bioinformatics. In this paper, we describe a parallel implementation of genome-wide association studies (GWAS) using Compute Unified Device Architecture (CUDA). Using a single NVIDIA GTX 280 graphics card, we achieve speedups of about 15 times over Intel Xeon E5420. We also implement a highly scalable, massive parallel, GWAS system using the Message Passing Interface (MPI) and show that a single GTX 280 can have similar performance as a 16-node cluster. We further apply the GPU program to two real genome-wide case-control data sets. The results show that the GPU program is 17.7 times as fast as the CPU version for an Age-related Macular Degeneration (AMD) data set and 25.7 times as fast as the CPU version for a Parkinsons disease data set.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/928_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/928_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>08/03/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>25</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Rui Jiang</Author>
           <Author email="">Feng Zeng</Author>
           <Author email="">Wangshu Zhang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/IJCBS.2009.32">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Rui Jiang,Feng Zeng,Wangshu Zhang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ff55e59c-e068-4456-a289-f60c94909099</GUID>
        <Name>Power Efficient Large Matrices Multiplication by Load Scheduling on Multi-core and GPU Platform with CUDA</Name>
        <ShortDescription>Power efficiency is one of the most important issues in high performance computing (HPC) interrelated to both software and hardware. Power dissipation of a program lies on algorithm design and power features of the computer components on which the program runs. In this work, we measure and model the power consumption of large matrices multiplication on multi-core CPU and GPU platform. By incorporating major physical power constrains of hardware components with the analysis of program execution behaviors, we approach to save the overall power consumption by using multithreading CPU to control two GPU devices computing in parallel synchronously. By implementing above method on real system, we show that it can save 22% of energy and speedup the kernel execution time by 71%, compare with solving the same large matrices multiplication using single CPU and GPU combination.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/927_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/927_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>2009 International Conference on Computational Science and Engineering</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>08/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">DaQi Ren</Author>
           <Author email="">Reiji Suda</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CSE.2009.488">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>DaQi Ren,Reiji Suda</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>36b2c6dc-0c25-4e19-a1b1-4f01eb1ba9a3</GUID>
        <Name>Solving 0/1 Knapsack Problem for Light Communication SLA-Based Workflow Mapping Using CUDA</Name>
        <ShortDescription>Mapping and running jobs on suitable resources are the core tasks in Grid Computing. In the algorithm to map light communication Grid-based workflow within the SLA context, there is an operation of resolving the conflict period which is exact a 0/1 knapsack problem. When the size of the workflow is large such as in the case of mapping a group of workflows, the time to solve this problem is long and thus, makes the whole mapping process long. In this paper, we describe a way to solve this problem by exploiting the parallel computing power of Graphic Processing Unit (GPU) with Compute Unified Device Architecture (CUDA). The experiment shows that the approach is very efficient with huge problem.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/926_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/926_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>2009 International Conference on Computational Science and Engineering</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>08/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Dang Minh Quan</Author>
           <Author email="">Laurence T. Yang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/CSE.2009.263">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Dang Minh Quan,Laurence T. Yang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5c12478a-a0ac-4a51-b496-dea3b86a936f</GUID>
        <Name>CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator</Name>
        <ShortDescription>Modern GPUs open a completely new field to optimize embarrassingly parallel algorithms. Implementing an algorithm on a GPU confronts the programmer with a new set of challenges for program optimization. Some of the most notable ones are isolating the part of the algorithm that can be optimized to run on the GPU; tuning the program for the GPU memory hierarchy whose organization and performance implications are radically different from those of general purpose CPUs; and optimizing programs at the instruction-level for the GPU.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ICPPW.2009.78</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/925_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/925_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>2009 International Conference on Parallel Processing Workshops</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>09/25/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Jakob Siegel</Author>
           <Author email="">Juergen Ributzka</Author>
           <Author email="">Xiaoming Li</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICPPW.2009.78">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jakob Siegel,Juergen Ributzka,Xiaoming Li</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4e0a3931-0a17-43fc-a2e9-068c23a4a0ea</GUID>
        <Name>String Matching on a Multicore GPU Using CUDA</Name>
        <ShortDescription>Graphics Processing Units (GPUs) have evolved over the past few years from dedicated graphics rendering devices to powerful parallel processors, outperforming traditional Central Processing Units (CPUs) in many areas of scientific computing. The use of GPUs as processing elements was very limited until recently, when the concept of General-Purpose computing on Graphics Processing Units (GPGPU) was introduced. GPGPU made possible to exploit the processing power and the memory bandwidth of the GPUs with the use of APIs that hide the GPU hardware from programmers. This paper presents experimental results on the parallel processing for some well known on-line string matching algorithms using one such GPU abstraction API, the Compute Unified Device Architecture (CUDA).</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/924_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/924_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Corfu, Greece</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>09/12/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Charalampos S. Kouzinopoulos</Author>
           <Author email="">Konstantinos G. Margaritis</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/PCI.2009.47">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Charalampos S. Kouzinopoulos,Konstantinos G. Margaritis</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fffa6694-d276-4fc7-938e-d1ae95148346</GUID>
        <Name>Isosurface Extraction and View-Dependent Filtering from Time-Varying Fields Using Persistent Time-Octree (PTOT)</Name>
        <ShortDescription>We develop a new algorithm for isosurface extraction andview-dependent filtering from large time-varying fields, by using anovel Persistent Time-Octree (PTOT) indexingstructure. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/TVCG.2009.160</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/923_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/923_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Polytechnic Institute of New York University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>12/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Cong Wang</Author>
           <Author email="">Yi-Jen Chiang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/TVCG.2009.160">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Cong Wang,Yi-Jen Chiang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ffccf360-34bc-43f7-8fd1-de125348ba45</GUID>
        <Name>Simulation of P Systems with Active Membranes on CUDA</Name>
        <ShortDescription>P systems or membrane systems provide a high level computational modeling framework that combines the structural and dynamic aspects of biological systems in a relevant and understandable way. P systems are massively parallel distributed, and non-deterministic systems. In this paper, we describe the implementation of a simulator for the class of recognizer P systems with active membranes by using the GPU (Graphics Processing Unit). We compare the high performance parallel simulator for the GPU to the simulator developed on a single CPU (Central Processing Unit), and we show that the GPU is better suited than the CPU to simulate P systems due to its highly parallel nature.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/922_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/922_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>CoSBi, Trento, Italy</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>10/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Jose Maria Cecilia Canales</Author>
           <Author email="">Jose Manuel Garcia Carrasco</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/HiBi.2009.13">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jose Maria Cecilia Canales,Jose Manuel Garcia Carrasco</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c4e34cb4-2b20-4b45-96ad-99b180dbcc47</GUID>
        <Name>Auto-tuning 3-D FFT library for CUDA GPUs</Name>
        <ShortDescription>Existing implementations of FFTs on GPUs are optimized for specific transform sizes like powers of two, and exhibit unstable and peaky performance i.e., do not perform as well in other sizes that appear in practice. Our new auto-tuning 3-D FFT on CUDA generates high performance CUDA kernels for FFTs of varying transform sizes, alleviating this problem.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1145/1654059.1654090</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/921_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/921_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Tokyo Institute of Technology and Japan Science and Technology Agency</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>11/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Akira Nukada</Author>
           <Author email="">Satoshi Matsuoka</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1145/1654059.1654090">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Akira Nukada,Satoshi Matsuoka</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>c94e9a92-86b2-46a4-bc60-9910216c5d48</GUID>
        <Name>CUDA Accelerated LTL Model Checking</Name>
        <ShortDescription>Recent technological developments made available various many-core hardware platforms. For example, a SIMD-like hardware architecture became easily accessible for many users who have their computers equipped with modern NVIDIA GPU cards with CUDA technology. In this paper we redesign the maximal accepting predecessors algorithm [7] for LTL model checking in terms of matrix-vector product in order to accelerate LTL model checking on many-core GPU platforms. Our experiments demonstrate that using the NVIDIA CUDA technology results in a significant speedup of verification process.
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/919_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/919_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Shenzhen, Guangdong, China </OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>12/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Jiri Barnat</Author>
           <Author email="">Lubos Brim</Author>
           <Author email="">Milan Ceska</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICPADS.2009.50">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jiri Barnat,Lubos Brim,Milan Ceska</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>810014ad-f17c-406d-aff6-737150d18fdd</GUID>
        <Name>RankBoost Acceleration on both NVIDIA CUDA and ATI Stream Platforms</Name>
        <ShortDescription>NVIDIA CUDA and ATI Stream are the two major general-purpose GPU (GPGPU) computing technologies. We implemented RankBoost, a web relevance ranking algorithm, on both NVIDIA CUDA and ATI Stream platforms to accelerate the algorithm and illustrate the differences between these two technologies.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ICPADS.2009.115</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/917_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/917_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Shenzhen, Guangdong, China </OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>12/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Bo Wang</Author>
           <Author email="">Tianji Wu</Author>
           <Author email="">Feng Yan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICPADS.2009.115">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bo Wang,Tianji Wu,Feng Yan</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>38e91cb6-7537-44d0-9a2f-3fb26b020e88</GUID>
        <Name>Optimal Data Distribution for Versatile Finite Impulse Response Filtering on Next-Generation Graphics Hardware Using CUDA</Name>
        <ShortDescription>In this paper, we investigate discrete finite impulse response (FIR) filtering of images, while harnessing the powerful computational resources of next-generation GPUs. These novel platforms exhibit a massive data parallel architecture with an advanced SIMT execution model and thread management, to enable designers to better cope with the infamous memory wall, i.e. the growing gap between the cost of data communication and computational processing.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ICPADS.2009.79</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/916_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/916_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Shenzhen, Guangdong, China</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>12/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Patrik Goorts</Author>
           <Author email="">Sammy Rogmans</Author>
           <Author email="">Philippe Bekaert</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICPADS.2009.79">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Patrik Goorts,Sammy Rogmans,Philippe Bekaert</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b6017842-9107-4a99-b3f3-b15cc13fe777</GUID>
        <Name>Parallel Lexicographic Names Construction with CUDA</Name>
        <ShortDescription>Suffix array is a simpler and compact alternative to the suffix tree, lexicographic name construction is the fundamental building block in suffix array construction process. This paper depicts the design issues of first data parallel implementation of the lexicographic name construction algorithm on a commodity multiprocessor GPU using the Compute Unified Device Architecture (CUDA) platform, both from NVIDIA Corporation.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ICPADS.2009.31</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/915_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/915_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Shenzhen, Guangdong, China</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>12/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Weidong Sun</Author>
           <Author email="">Zongmin Ma</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICPADS.2009.31">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Weidong Sun,Zongmin Ma</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0b8f9a03-72c0-4eb4-bed8-7189f3048805</GUID>
        <Name>Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+</Name>
        <ShortDescription>Graphic Processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computing power to accelerate several general purpose applications. Both the AMD and NVIDIA corps provide their specific high performance GPUs and software platforms. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/ICPADS.2009.12</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/914_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/914_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Shenzhen, Guangdong, China</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>12/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Guibin Wang</Author>
           <Author email="">Tao Tang</Author>
           <Author email="">Xudong Fang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/ICPADS.2009.12">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Guibin Wang,Tao Tang,Xudong Fang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a76e6a95-dc8f-444b-af64-badd2fddee07</GUID>
        <Name>Accelerating Multi-scale Image Fusion Algorithms Using CUDA</Name>
        <ShortDescription>Recently, fusion speed has emerged as an important factor in the image fusion and a substantial amount of memory and computing power are required for a high-speed fusion. This paper shows approaches to accelerate multi-scale image fusion speed on GPU (Graphics Processing Unit) using CUDA (Compute Unified Device Architecture). </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/SoCPaR.2009.63</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/913_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/913_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Malacca, Malaysia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>12/14/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Seung-Hun Yoo</Author>
           <Author email="">Jin-Hyung Park</Author>
           <Author email="">Chang-Sung Jeong</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/SoCPaR.2009.63">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Seung-Hun Yoo,Jin-Hyung Park,Chang-Sung Jeong</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>62ed4115-e5b5-4c6d-947e-3cb75f5c66d5</GUID>
        <Name>An Improved Parallel Implementation of 3D DRIE Simulation on GPU</Name>
        <ShortDescription>Deep reactive ion etching (DRIE) technique is a new and powerful tool in Micro-Electro-Mechanical Systems (MEMS) fabrication. A 3D DRIE simulation can help researcher understand the time-evolution of Bosch process used in DRIE. Due to the high complexity of the algorithm used in the simulation, it is necessary to develop an algorithm that can accelerate the simulation. </ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/I-SPAN.2009.111</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/912_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/912_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Kaohsiung, Taiwan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>12/14/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Fan Zhang</Author>
           <Author email="">Gang Wang</Author>
           <Author email="">Xiaoguang Liu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/I-SPAN.2009.111">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Fan Zhang,Gang Wang,Xiaoguang Liu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5f454c72-f9aa-4a01-b530-1dc441677f43</GUID>
        <Name>CheCUDA: A Checkpoint/Restart Tool for CUDA Applications</Name>
        <ShortDescription>In this paper, a tool named CheCUDA is designed to checkpoint CUDA applications that use GPUs as accelerators. As existing checkpoint/restart implementations do not support checkpointing the GPU status, CheCUDA hooks a part of basic CUDA driver API calls in order to record the status changes on the main memory.</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/PDCAT.2009.78</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/911_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/911_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Higashi Hiroshima, Japan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>12/11/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Hiroyuki Takizawa</Author>
           <Author email="">Katsuto Sato</Author>
           <Author email="">Kazuhiko Komatsu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/PDCAT.2009.78">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Hiroyuki Takizawa,Katsuto Sato,Kazuhiko Komatsu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d7276c60-49fb-4114-8743-89bca632df40</GUID>
        <Name>Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing</Name>
        <ShortDescription>Power dissipation is one of the most imminent limitation factors influencing the development of High Performance Computing (HPC). Toward power-efficient HPC on CPU-GPU hybrid platform, we are investigating software methodologies to achieve optimized power utilization by algorithm design and programming technique. In this paper we discuss power measurements of GPU</ShortDescription>
        <URL>http://www.computer.org/portal/web/csdl/doi/10.1109/PDCAT.2009.65</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/910_cs_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/910_cs_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Higashi Hiroshima, Japan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>12/11/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Reiji Suda</Author>
           <Author email="">Da Qi Ren</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/PDCAT.2009.65">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Reiji Suda,Da Qi Ren</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>985ae77e-9e9e-4719-b960-cce5bb84051a</GUID>
        <Name>Fast Parallel Expectation Maximization for Gaussian Mixture Models on GPUs Using CUDA</Name>
        <ShortDescription>Expectation maximization (EM) algorithm is an iterative technique widely used in the fields of signal processing and data mining. We present a parallel implementation of EM for finding maximum likelihood estimates of parameters of Gaussian mixture models, designed for many-core architecture of Graphics Processing Units (GPU).</ShortDescription>
        <URL>http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5166982</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/909_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/909_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>NVIDIA Corp.</OrganizationName>
        <OrganizationURL>http://www.nvidia.com/cuda</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>07/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Kumar, N</Author>
           <Author email="">Satoor, S</Author>
           <Author email=""> Buck, I</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5166982">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Kumar, N,Satoor, S, Buck, I</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fcfe68de-0cd5-4ac2-b61d-4180daf7189f</GUID>
        <Name>Optical Flow Computation on Compute Unified Device Architecture</Name>
        <ShortDescription>In this study, the implementation of an image processing technique on compute unified device architecture (CUDA) is discussed. CUDA is a new hardware and software architecture developed by NVIDIA Corporation for the general- purpose computation on graphics processing units. CUDA features an on-chip shared memory with very fast general read and write access, which enables threads in a block to share their data effectively.</ShortDescription>
        <URL>http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4362776</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/908_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/908_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Yamaguchi Univ</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>10/29/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Mizukami, Y</Author>
           <Author email="">Tadamura, K</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4362776">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mizukami, Y,Tadamura, K</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1a3dd903-83a7-40a0-ba34-0f84d4c7df59</GUID>
        <Name>Parallelizing Motion JPEG 2000 with CUDA</Name>
        <ShortDescription>Due to the rapid growth of Graphics Processing Unit (GPU) processing capability, using GPU as a coprocessor for assisting the CPU in computing massive data has become indispensable. Nvidia's CUDA general-purpose graphical processing unit (GPGPU) architecture can greatly benefit single instruction multiple thread (SIMT) styled, computationally expensive programs.</ShortDescription>
        <URL>http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5380169</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/907_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/907_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>IEEE</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>01/15/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Datla, Sanketh</Author>
           <Author email="">Gidijala</Author>
           <Author email="">Naga Sathish</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5380169">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Datla, Sanketh,Gidijala,Naga Sathish</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>627b38d9-d47e-4bdf-bb1c-b66324035ed5</GUID>
        <Name>FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs</Name>
        <ShortDescription>As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore's law, the computing industry has switched its route to higher performance through parallel processing. The rise of multicore systems in all domains of computing has opened the door to heterogeneous multiprocessor, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs and FPGAs are becoming very popular in PC-based heterogeneous systems for speeding up compute intensive kernels of scientific, imaging and simulation applications.</ShortDescription>
        <URL>http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5226333</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/906_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/906_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Univ. of Illinois</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>08/28/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Papakonstantinou, A</Author>
           <Author email="">Gururaj, K</Author>
           <Author email="">Stratton, J.A</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5226333">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Papakonstantinou, A,Gururaj, K,Stratton, J.A</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e8ece29c-cc39-4c42-a3fb-4f4d78a4a3bf</GUID>
        <Name>Reliability modeling of MEMS devices on CUDA based HPC setup</Name>
        <ShortDescription>In this paper, we have reviewed the development in CUDA and the implementation of various distribution that exists in the reliability for MEMS based devices on a CUDA setup.</ShortDescription>
        <URL>http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5340289</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/905_logo_xplore_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/905_logo_xplore_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Acropolis Inst. of Technol. &amp; Res., Indore, India</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>11/24/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Pathak, R</Author>
           <Author email="">Joshi, S</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5340289">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Pathak, R,Joshi, S</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fe250813-b016-4b71-b44d-80e8de5f4166</GUID>
        <Name>Survey on Parallel Programming Model</Name>
        <ShortDescription>The development of microprocessors design has been shifting to multi-core architectures. Therefore, it is expected that parallelism will play a significant role in future generations of applications. Throughout the years, there has been a myriad number of parallel programming models proposed. In choosing a parallel programming model, not only the performance aspect is important, but also qualitative the aspect of how well parallelism is abstracted to developers. A model with a well abstraction of parallelism leads to a higher application-development productivity. In this paper, we propose seven criteria to qualitatively evaluate parallel programming models. Our focus is on how parallelism is abstracted and presented to application developers. As a case study, we use these criteria to investigate six well-known parallel programming models in the HPC community.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/904_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/904_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Sun Microsystems</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>10/11/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="henry.kasim@sun.com">Henry Kasim</Author>
           <Author email="verdi.march@sun.com">Verdi March</Author>
           <Author email="rita.zhang@sun.com">Rita Zhang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/u61n6x07u7j26x73/?p=fa5143e1608b49dc99f34e1ee2118042">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Henry Kasim,Verdi March,Rita Zhang,henry.kasim@sun.com,verdi.march@sun.com,rita.zhang@sun.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4736ba0b-002f-4ca5-b107-c70a1e05a004</GUID>
        <Name>A Variational Approach to Semiautomatic Generation of Digital Terrain Models</Name>
        <ShortDescription>We present a semiautomatic approach to generate high quality digital terrain models (DTM) from digital surface models (DSM). A DTM is a model of the earths surface, where all man made objects and the vegetation have been removed. In order to achieve this, we use a variational energy minimization approach. The proposed energy functional incorporates Huber regularization to yield piecewise smooth surfaces and an L1 norm in the data fidelity term. Additionally, a minimum constraint is used in order to prevent the ground level from pulling up, while buildings and vegetation are pulled down. Being convex, the proposed formulation allows us to compute the globally optimal solution. Clearly, a fully automatic approach does not yield the desired result in all situations. Therefore, we additionally allow the user to affect the algorithm using different user interaction tools. Furthermore, we provide a real-time 3D visualization of the output of the algorithm which additionally helps the user to assess the final DTM. We present results of the proposed approach using several real data sets.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/903_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/903_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Graz University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>11/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Andreas Klaus</Author>
           <Author email="">Thomas Pock</Author>
           <Author email="">Markus Grabner</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/l0275n1r1u7mr310/?p=fa5143e1608b49dc99f34e1ee2118042">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Andreas Klaus,Thomas Pock,Markus Grabner</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>58b1aaf4-695f-453b-833a-0c77c40fcae7</GUID>
        <Name>Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs</Name>
        <ShortDescription>We discuss implementing blocked sparse matrix-vector multiplication for NVIDIA GPUs. We outline an algorithm and various optimizations, and identify potential future improvements and challenging tasks. In comparison with previously published implementation, our implementation is faster on matrices having many high fill-ratio blocks but slower on matrices with low number of non-zero elements per row.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/902_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/902_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Institute for System Programming of RAS, Russia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>07/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="amonakov@ispras.ru">Alexander Monakov</Author>
           <Author email="arut@ispras.ru">Arutyun Avetisyan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/1152206750240587/?p=fa5143e1608b49dc99f34e1ee2118042">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alexander Monakov,Arutyun Avetisyan,amonakov@ispras.ru,arut@ispras.ru</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ba6d3a3b-b774-495a-8c34-f7c46204b175</GUID>
        <Name>AtelierM++: a fast and accurate marbling system</Name>
        <ShortDescription>We present AtelierM++, a new interactive marbling image rendering system which allows artists to create marbling textures with real-time visual feedback on mega-pixel sized images. Marbling is a method of aqueous surface design, which can produce patterns similar to marble or other stone, hence the name. The system is based on the physical model of the traditional marbling process. We simulate real marbling by solving the Navier-Stokes equations on the graphics processing unit. We employ a third-order accurate but fast Unsplit semi-Lagragian Constrained Interpolation Profile method to reduce the numerical dissipation while retaining the stability. To simulate very sharp interface lines among different paints, a simple yet effective transformation function is applied to the paint concentrations. Several intuitive interfaces are implemented to provide flexible control for users. Extensive experimental results are shown to demonstrate both the effectiveness and efficiency of the proposed approach.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/901_cover-medium7_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/901_cover-medium7_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Zhejiang University, China</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>05/12/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="hanlizhao@gmail.com">Hanli Zhao</Author>
           <Author email="jin@cad.zju.edu.cn">Xiaogang Jin</Author>
           <Author email="lushufang@cad.zju.edu.cn">Shufang Lu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/84l21203152g1k88/?p=a1df4c3712b1475ebe1afdc79576e2f3">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Hanli Zhao,Xiaogang Jin,Shufang Lu,hanlizhao@gmail.com,jin@cad.zju.edu.cn,lushufang@cad.zju.edu.cn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>13440b6a-5288-46fb-912a-0a7945d88544</GUID>
        <Name>Implementing P Systems Parallelism by Means of GPUs</Name>
        <ShortDescription>Software development for Membrane Computing is growing up yielding new applications. Nowadays, the efficiency of P systems simulators have become a critical point when working with instances of large size. The newest generation of GPUs (Graphics Processing Units) provide a massively parallel framework to compute general purpose computations. We present GPUs as an alternative to obtain better performance in the simulation of P systems and we illustrate it by giving a solution to the N-Queens problem as an example.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/900_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/900_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Sevilla, Spain</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>01/20/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="chema@ditec.um.es">Jose M. Cecilia</Author>
           <Author email="jmgarcia@ditec.um.es">Jose M. Garcia</Author>
           <Author email="gines.guerrero@ditec.um.es">Gines D. Guerrero</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/p711691h1p64x7u3/?p=a1df4c3712b1475ebe1afdc79576e2f3">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jose M. Cecilia,Jose M. Garcia,Gines D. Guerrero,chema@ditec.um.es,jmgarcia@ditec.um.es,gines.guerrero@ditec.um.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b21f6d6e-aa27-42cf-b19a-8c8d970e2433</GUID>
        <Name>Real-Time Neighborhood Based Disparity Estimation Incorporating Temporal Evidence </Name>
        <ShortDescription>This paper presents a system for dense area based disparity estimation from binocular rectified image sequences with the integration of temporal evidence. The system is using dense 2D optical flow fields and timely displaced disparity estimates to reason about the observed 3D scene flow. This scene flow is then exploited to strengthen timely consistent observations in the disparity estimation. Moreover a novel neighborhood assumption is presented, which allows to seamlessly implement the presented algorithm on the GPU. It is shown that by means of the presented approach the sensitivity to noise and ambiguities observed with plain real-time disparity estimations can be improved, even in fully dynamic scenarios with simultaneous movement of objects and cameras</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/899_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/899_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universiy of Kiel, Germany</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>06/29/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="bartczak@mip.informatik.uni-kiel.de">Bogumil Bartczak</Author>
           <Author email="djung@mip.informatik.uni-kiel.de">Daniel Jung</Author>
           <Author email="rk@mip.informatik.uni-kiel.de">Reinhard Koch</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/c72627726n500mp0/?p=a1df4c3712b1475ebe1afdc79576e2f3">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bogumil Bartczak,Daniel Jung,Reinhard Koch,bartczak@mip.informatik.uni-kiel.de,djung@mip.informatik.uni-kiel.de,rk@mip.informatik.uni-kiel.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>82775ff7-a0ec-4b29-8d21-f366cda8039a</GUID>
        <Name>Relighting Forest Ecosystems </Name>
        <ShortDescription>Real-time cinematic relighting of large, forest ecosystems remains a challenging problem, in that important global illumination effects, such as leaf transparency and inter-object light scattering, are difficult to capture, given tight timing constraints and scenes that typically contain hundreds of millions of primitives. A solution that is based on a lattice-Boltzmann method is suggested. Reflectance, transmittance, and absorptance parameters are taken from measurements of real plants and integrated into a parameterized, dynamic global illumination model. When the model is combined with fast shadow rays, traced on a GPU, near real-time cinematic relighting is achievable for forest scenes containing hundreds of millions of polygons.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/898_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/898_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Clemson University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>11/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="jesteel@cs.clemson.edu">Jay E. Steele</Author>
           <Author email="geist@cs.clemson.edu">Robert Geist</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/gv45761v86822248/?p=a1df4c3712b1475ebe1afdc79576e2f3">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jay E. Steele,Robert Geist,jesteel@cs.clemson.edu,geist@cs.clemson.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>179e73fd-8aa8-41ce-9778-7fc4aa5d5044</GUID>
        <Name>Acceleration of cardiac tissue simulation with graphic processing units</Name>
        <ShortDescription>In this technical note we show the promise of using graphic processing units (GPUs) to accelerate simulations of electrical wave propagation in cardiac tissue, one of the more demanding computational problems in cardiology. We have found that the computational speed of two-dimensional (2D) tissue simulations with a single commercially available GPU is about 30 times faster than with a single 2.0 GHz Advanced Micro Devices (AMD) Opteron processor. We have also simulated wave conduction in the three-dimensional (3D) anatomic heart with GPUs where we found the computational speed with a single GPU is 1.6 times slower than with a 32-central processing unit (CPU) Opteron cluster. However, a cluster with two or four GPUs is faster than the CPU-based cluster. These results demonstrate that a commodity personal computer is able to perform a whole heart simulation of electrical wave conduction within times that enable the investigators to interact more easily with their simulations. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/897_prediction_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/897_prediction_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>David Geffen School of Medicine at UCLA, Los Angeles, CA</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>04</ReleaseDay>
        <ReleaseDateDisplay>08/04/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="dasato@mednet.ucla.edu">Daisuke Sato</Author>
           <Author email="agarfinkel@mednet.ucla.edu">Alan Garfinkel</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/e276635543061m61/?p=a1df4c3712b1475ebe1afdc79576e2f3">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Daisuke Sato,Alan Garfinkel,dasato@mednet.ucla.edu,agarfinkel@mednet.ucla.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>765b8ab8-2775-423e-8d68-3d5f4a6cc0b5</GUID>
        <Name>Real-Time Prediction of Brain Shift Using Nonlinear Finite Element Algorithms</Name>
        <ShortDescription>Patient-specific biomechanical models implemented using specialized nonlinear (i.e. taking into account material and geometric nonlinearities) finite element procedures were applied to predict the deformation field within the brain for five cases of craniotomy-induced brain shift. The procedures utilize the Total Lagrangian formulation with explicit time stepping. The loading was defined by prescribing deformations on the brain surface under the craniotomy. Application of the computed deformation fields to register the preoperative images with the intraoperative ones indicated that the models very accurately predict the intraoperative positions and deformations of the brain anatomical structures for limited information about the brain surface deformations. For each case, it took less than 40 s to compute the deformation field using a standard personal computer, and less than 4 s using a Graphics Processing Unit (GPU). The results suggest that nonlinear biomechanical models can be regarded as one possible method of complementing medical image processing techniques when conducting non-rigid registration within the real-time constraints of neurosurgery.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/896_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/896_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>The University of Western Australia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>09/30/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="grandj@mech.uwa.edu.au">Grand Roman Joldes</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/u054680542xw31v7/?p=75896e3ea4664666a6c91d7e3a7bab17">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Grand Roman Joldes,grandj@mech.uwa.edu.au</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>6134f011-ed6e-4cf4-9f52-887c65646088</GUID>
        <Name>An Extension of the StarSs Programming Model for Platforms with Multiple GPUs</Name>
        <ShortDescription>While general-purpose homogeneous multi-core architectures are becoming ubiquitous, there are clear indications that, for a number of important applications, a better performance/power ratio can be attained using specialized hardware accelerators. These accelerators require specific SDK or programming languages which are not always easy to program. Thus, the impact of the new programming paradigms on the programmer's productivity will determine their success in the high-performance computing arena. In this paper we present GPU Superscalar (GPUSs), an extension of the Star Superscalar programming model that targets the parallelization of applications on platforms consisting of a general-purpose processor connected with multiple graphics processors. GPUSs deals with architecture heterogeneity and separate memory address spaces, while preserving simplicity and portability. Preliminary experimental results for a well-known operation in numerical linear algebra illustrate the correct adaptation of the runtime to a multi-GPU system, attaining notable performance results.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/895_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/895_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Consejo Superior de Investigaciones Cientificas, Spain</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>08/22/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="eduard.ayguade@bsc.es">Eduard Ayguade</Author>
           <Author email="rosa.m.badia@bsc.es">Rosa M. Badia</Author>
           <Author email="figual@icc.uji.es">Francisco D. Igual</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/u51jk1111k1067g2/?p=75896e3ea4664666a6c91d7e3a7bab17">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Task-level parallelism,heterogeneous systems,programming models,Eduard Ayguade,Rosa M. Badia,Francisco D. Igual,eduard.ayguade@bsc.es,rosa.m.badia@bsc.es,figual@icc.uji.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>480d6c65-d2c8-4d10-88e9-8f4cecdddd49</GUID>
        <Name>Fast Image Mapping of Endoscopic Image Mosaics with Three-Dimensional Ultrasound Image for Intrauterine Treatment of Twin-to-Twin Transfusion Syndrome </Name>
        <ShortDescription>This paper describes a fast image mapping system that integrates endoscopic image mosaics with three-dimensional (3-D) ultrasound images for assisting intrauterine treatment of twin-to-twin transfusion syndrome (TTTS) by laser photocoagulation. Endoscopic laser photocoagulation treatment has a good survival rate and a low complication rate for twins. However, the small field of view and lack of surrounding information makes the identification of vessels anastomosis difficult. We have developed an extended placenta visualization system with the fusion of endoscopic image mosaics with a 3-D ultrasound-image placenta model. Fully automatic and fast calibration is used for endoscope calibration in fluid. The 3-D spatial position of the endoscopic images and the ultrasound image are tracked by a 3-D position tracking device. The mosaiced endoscope images are registered to the surface of the 3-D ultrasound placenta model by using a fast GPU-based image rendering method. Experimental results show that the system may provide an improved and efficient way of planning and guidance in laser photocoagulation TTTS treatment.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/894_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/894_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>The University of Tokyo</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>07/15/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="liao@bmpe.t.u-tokyo.ac.jp">Hongen Liao</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/a1xnh317014643rp/?p=75896e3ea4664666a6c91d7e3a7bab17">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Hongen Liao,liao@bmpe.t.u-tokyo.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0a6541a1-93ac-4431-a6ff-45865398551e</GUID>
        <Name>Accelerated Discovery of Discrete M-Clusters/Outliers on the Raster Plane Using Graphical Processing Units</Name>
        <ShortDescription>This paper presents two discrete computational geometry algorithms designed for execution on Graphics Processing Units (GPUs). The algorithms are parallelized versions of sequential algorithms intended for application in geographical data mining. The first algorithm finds clusters of m points, called m-clusters, in the rasterized plane. The second algorithm complements the first by identifying outliers, those points which are not members of any m-clusters. The use of a raster representation of coordinates provides an ideal data stream environment for efficient GPU utilization. The parallel algorithms have low memory demands, and require only a limited amount of inter-process communication. Initial performance analysis indicates the algorithms are scalable, both in problem size and in the number of seeds, and significantly outperform commercial implementations.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/893_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/893_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> Grand Valley State University, MI / Univ. of Maine-Augusta, ME</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="trefftzc@gvsu.edu">Christian Trefftz</Author>
           <Author email="szakas@maine.edu">Joseph Szakas</Author>
           <Author email="majdanig@student.gvsu.edu">Igor Majdandzic</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/41564574nr1j1796/?p=75896e3ea4664666a6c91d7e3a7bab17">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>GPU algorithms,Geographical data mining,Christian Trefftz,Joseph Szakas,Igor Majdandzic,trefftzc@gvsu.edu,szakas@maine.edu,majdanig@student.gvsu.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>db591363-d303-433e-9ab7-f3e856c6a6b0</GUID>
        <Name>GP on SPMD parallel graphics hardware for mega Bioinformatics data mining</Name>
        <ShortDescription>We demonstrate a SIMD C++ genetic programming system on a single 128 node parallel NVIDIA GeForce 8800 GTX GPU under RapidMind's GPGPU Linux software by predicting ten year+ outcome of breast cancer from a dataset containing a million inputs. NCBI GEO GSE3494 contains hundreds of Affymetrix HG-U133A and HG-U133B GeneChip biopsies. Multiple GP runs each with a population of 5 million programs winnow useful variables from the chaff at more than 500 million GPops per second. Sources available via FTP.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/892_cover-medium6_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/892_cover-medium6_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> University of Essex,  Colchester</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>05/08/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="wlangdon@essex.ac.uk">W. B. Langdon</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/8t011m1r675628q6/?p=75896e3ea4664666a6c91d7e3a7bab17">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>W. B. Langdon,wlangdon@essex.ac.uk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>db591363-d303-433e-9ab7-f3e856c6a6b0</GUID>
        <Name>GP on SPMD parallel graphics hardware for mega Bioinformatics data mining</Name>
        <ShortDescription>We demonstrate a SIMD C++ genetic programming system on a single 128 node parallel NVIDIA GeForce 8800 GTX GPU under RapidMind's GPGPU Linux software by predicting ten year+ outcome of breast cancer from a dataset containing a million inputs. NCBI GEO GSE3494 contains hundreds of Affymetrix HG-U133A and HG-U133B GeneChip biopsies. Multiple GP runs each with a population of 5 million programs winnow useful variables from the chaff at more than 500 million GPops per second. Sources available via FTP.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/892_cover-medium6_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/892_cover-medium6_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> University of Essex,  Colchester</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>05/08/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="wlangdon@essex.ac.uk">W. B. Langdon</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/8t011m1r675628q6/?p=75896e3ea4664666a6c91d7e3a7bab17">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>W. B. Langdon,wlangdon@essex.ac.uk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ddc5f77d-8ad3-4adf-bf3b-d88271d702fe</GUID>
        <Name>A Real-Time Evolutionary Object Recognition System</Name>
        <ShortDescription>We have created a real-time evolutionary object recognition system. Genetic Programming is used to automatically search the space of possible computer vision programs guided through user interaction. The user selects the object to be extracted with the mouse pointer and follows it over multiple frames of a video sequence. Several different alternative algorithms are evaluated in the background for each input image. Real-time performance is achieved through the use of the GPU for image processing operations. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/891_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/891_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Eberhard-Karls-Universitat Tubingen</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>04/10/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="marc.ebner@wsii.uni-tuebingen.de">Marc Ebner</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/g50w45v0018m5ux6/?p=75896e3ea4664666a6c91d7e3a7bab17">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Marc Ebner,marc.ebner@wsii.uni-tuebingen.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4861ac13-3ca4-4659-9fb8-ae5704b94996</GUID>
        <Name>Concurrent CT Reconstruction and Visual Analysis Using Hybrid Multi-resolution Raycasting in a Cluster Environment </Name>
        <ShortDescription>GPU clusters nowadays combine enormous computational resources of GPUs and multi-core CPUs. This paper describes a distributed program architecture that leverages all resources of such a cluster to incrementally reconstruct, segment and render 3D cone beam computer tomography (CT) data with the objective to provide the user with results as quickly as possible at an early stage of the overall computation. As the reconstruction of high-resolution data sets requires a significant amount of time, our system first creates a low-resolution preview volume on the head node of the cluster, which is then incrementally supplemented by high-resolution blocks from the other cluster nodes using our multi-resolution renderer. It is further used for graphically choosing reconstruction priority and render modes of sub-volume blocks. The cluster nodes use their GPUs to reconstruct and render sub-volume blocks, while their multi-core CPUs are used to segment already available blocks. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/890_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/890_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Visualisierungsinstitut der Universitat Stuttgart</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>11/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Steffen Frey</Author>
           <Author email="">Christoph Muller</Author>
           <Author email="">Magnus Strengert</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/m263r6v4861811p4/?p=75896e3ea4664666a6c91d7e3a7bab17">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Steffen Frey,Christoph Muller,Magnus Strengert</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>67fcb101-5c74-49c7-abf2-b026feeea773</GUID>
        <Name>Modelling Anisotropic Viscoelasticity for Real-Time Soft Tissue Simulation</Name>
        <ShortDescription>Previously almost all biomechanically-based time-critical surgical simulation has ignored the well established features of tissue mechanical response of anisotropy and time-dependence. We address this issue by presenting an efficient solution procedure for anisotropic visco-hyperelastic constitutive models which allows use of these in nonlinear explicit dynamic finite element algorithms. We show that the procedure allows incorporation of both anisotropy and viscoelasticity for as little as 5.1% additional cost compared with the usual isotropic elastic models. When combined with high performance GPU execution the complete framework is suitable for time-critical simulation applications such as interactive surgical simulation and intraoperative image registration.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/889_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/889_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University College London, UK</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>09/10/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Zeike A. Taylor</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/n2680u26wm652538/?p=8f129f4acef046bb95062c1a4c64ba22">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Zeike A. Taylor</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fdeb943e-2746-42f1-a70b-e92ca74592c7</GUID>
        <Name>Fast and Robust Face Tracking for Analyzing Multiparty Face-to-Face Meetings</Name>
        <ShortDescription>This paper presents a novel face tracker and verifies its effectiveness for analyzing group meetings. In meeting scene analysis, face direction is an important clue for assessing the visual attention of meeting participants. The face tracker, called STCTracker (Sparse Template Condensation Tracker), estimates face position and pose by matching face templates in the framework of a particle filter. STCTracker is robust against large head rotation, up to 60 degrees in the horizontal direction, with relatively small mean deviation error. Also, it can track multiple faces simultaneously in real-time by utilizing a modern GPU (Graphics Processing Unit), e.g. 6 faces at about 28 frames/second on a single PC. Also, it can automatically build 3-D face templates upon initialization of the tracker. This paper evaluates the tracking errors and verifies the effectiveness of STCTracker for meeting scene analysis, in terms of conversation structures, gaze directions, and the structure of cross-modal interactions involving head gestures and utterances. Experiments confirm that STCTracker can basically match the performance of from the user-unfriendly magnetic-sensor-based motion capture system.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/888_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/888_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>NTT Communication Science Labs, Japan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>09/20/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="otsuka@eye.brl.ntt.co.jp">Kazuhiro Otsuka</Author>
           <Author email="yamato@eye.brl.ntt.co.jp">Junji Yamato</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/t37006714783572j/?p=8f129f4acef046bb95062c1a4c64ba22">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Kazuhiro Otsuka,Junji Yamato,otsuka@eye.brl.ntt.co.jp,yamato@eye.brl.ntt.co.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d29fe864-0d87-456c-9127-ae0164499337</GUID>
        <Name>SUNVIZ: A Real-Time Visualization Environment for Space Physics Applications </Name>
        <ShortDescription>Real-time physically accurate simulations are difficult to create because of limited computational power available on a CPU. General purpose computing on the graphics processing unit (GPU) can provide a significant increase in performance. We are able to investigate the flow characteristics of a cloud of charged particles, which is one of the first steps in our goal of generating a real-time Coronal Mass Ejection (CME) simulator. Preliminary results show a sustained 60 Hz visual simulation with approximately four million particles and a non-visual simulation of 16 million particles at 30 Hz. The simulator provides a novel way to investigate a CME in real-time, and it has the potential to predict when a particular CME is geoeffective, i.e. an event that could damage electrical infrastructure such as satellites, space stations, power grids, etc... 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/887_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/887_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Alberta Physics</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>12/03/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">S. Eliuk</Author>
           <Author email="">P. Boulanger</Author>
           <Author email="">K. Kabin</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/c4h4t07117725p2g/?p=8f129f4acef046bb95062c1a4c64ba22">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>S. Eliuk,P. Boulanger,K. Kabin</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>af4a522e-2156-4c31-99f2-519daaa3e24d</GUID>
        <Name>Graphic processing unit-accelerated mutual information-based 3D image rigid registration</Name>
        <ShortDescription>Mutual information (MI)-based image registration is effective in registering medical images, but it is computationally expensive. This paper accelerates MI-based image registration by dividing computation of mutual information into spatial transformation and histogram-based calculation, and performing 3D spatial transformation and trilinear interpolation on graphic processing unit (GPU). The 3D floating image is downloaded to GPU as flat 3D texture, and then fetched and interpolated for each new voxel location in fragment shader. The transformed results are rendered to textures by using frame buffer object (FBO) extension, and then read to the main memory used for the remaining computation on CPU. Experimental results show that GPU-accelerated method can achieve speedup about an order of magnitude with better registration result compared with the software implementation on a single-core CPU.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/886_transactions_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/886_transactions_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Dalian University of Technology, China</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>10/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ouzyg@dlut.edu.cn">Zongying Ou</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/f013r769058g4814/?p=8f129f4acef046bb95062c1a4c64ba22">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Zongying Ou,ouzyg@dlut.edu.cn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>684bdf6c-6ae7-4c97-a001-9dcc5b568603</GUID>
        <Name>Dual-RBF based surface reconstruction </Name>
        <ShortDescription>Surface reconstruction (Bloomenthal and Wyvill, Introduction to Implicit Surfaces, 1997) is a fundamental work in Computer Aided Design (CAD) and Computer Graphics (CG). In this paper, motivated by the physical polar field model (Yuxu Lin Chun Chen in Proceedings of the 3rd Pacific-Rim Symposium on Image and Video Technology, 1997), we propose a novel implicit surface reconstruction approach, named Dual-RBF. Through simulating the physical polar field model, Dual-RBF provides a nice initial reconstruction state firstly. Then, two simple nonlinear methods are introduced to adjust the configurations of Dual-RBF model, so that a more accurate reconstruction is reached. Thirdly, the Dual-RBF becomes even more robust to fill the holes on some flawed input point-clouds by adopting a multi-level strategy. Finally, the visualization of the surface reconstruction is speed up with GPU. Experimental results show that the proposed approach is faster and more robust than previous implicit surface reconstruction techniques.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/885_visualcomputer_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/885_visualcomputer_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Zhejiang University, China</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>03/03/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="linyuxu@zju.edu.cn">Yuxu Lin</Author>
           <Author email="chenc@cs.zju.edu.cn">Chun Chen</Author>
           <Author email="brooksong@ieee.org">Mingli Song</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/n593wm5625266mh5/?p=8f129f4acef046bb95062c1a4c64ba22">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yuxu Lin,Chun Chen,Mingli Song,linyuxu@zju.edu.cn,chenc@cs.zju.edu.cn,brooksong@ieee.org</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>67822eb4-51de-4035-b30f-046c25f50c9d</GUID>
        <Name>A Color Management Process for Real Time Color Reconstruction of Multispectral Images</Name>
        <ShortDescription>We introduce a new accurate and technology independent display color characterization model for color rendering of multispectral images. The establishment of this model is automatic, and does not exceed the time of a coffee break to be efficient in a practical situation. This model is a part of the color management workflow of the new tools designed at the C2RMF for multispectral image analysis of paintings acquired with the material developed during the CRISATEL European project. The analysis is based on color reconstruction with virtual illuminants and use a GPU (Graphics processor unit) based processing model in order to interact in real time with a virtual lighting. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/884_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/884_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universite Jean Monnet / France</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>07/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Philippe Colantoni</Author>
           <Author email="">Jean-Baptiste Thomas</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/205333534n861673/?p=0dd80c5c9b564b009c9e0e9c88044df6">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Philippe Colantoni,Jean-Baptiste Thomas</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5da0d849-dfa9-4a99-b8d9-1ed49ac75197</GUID>
        <Name>Regular Expression Matching on Graphics Hardware for Intrusion Detection</Name>
        <ShortDescription>The expressive power of regular expressions has been often exploited in network intrusion detection systems, virus scanners, and spam filtering applications. However, the flexible pattern matching functionality of regular expressions in these systems comes with significant overheads in terms of both memory and CPU cycles, since every byte of the inspected input needs to be processed and compared against a large set of regular expressions. </ShortDescription>
        <URL>http://springerlink.com/content/b3m7662014272t8m/?p=0dd80c5c9b564b009c9e0e9c88044df6</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/883_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/883_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Foundation for Research and Technology,Hellas</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>09/30/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>48</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="gvasil@ics.forth.gr">Giorgos Vasiliadis</Author>
           <Author email="mikepo@ics.forth.gr">Michalis Polychronakis</Author>
           <Author email="antonat@ics.forth.gr">Spiros Antonatos</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/b3m7662014272t8m/?p=0dd80c5c9b564b009c9e0e9c88044df6">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Giorgos Vasiliadis,Michalis Polychronakis,Spiros Antonatos,gvasil@ics.forth.gr,mikepo@ics.forth.gr,antonat@ics.forth.gr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0d6ee814-5cf0-4da6-988b-a9e7159f4f0a</GUID>
        <Name>Haptic guided 3-D deformable image registration </Name>
        <ShortDescription>Purpose  We present a system which supports deformable image registration guided by a haptic device. 
Methods  The haptic device is tied to a block matching method where a set of uniformly distributed control points determine the block positions. Each control point constitutes a particle in a mass spring grid which limits the space of allowed movements to elastic movements. Control points are manipulated by the haptic device, and the negative gradient of the similarity metric over the corresponding block is rendered as a force on the haptic device guiding the user to a minimum of the optimization landscape. Fast update of forces was achieved by exploiting the GPU for computations of the similarity metric and for interpolation of the deformation field. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/882_cover-medium5_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/882_cover-medium5_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Oslo, Norway / Rikshospitalet University Hospital, Norway</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>02/24/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="pettri@ifi.uio.no">Petter Risholm</Author>
           <Author email="">Eigil Samset</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/401k4px2627gnr38/?p=0dd80c5c9b564b009c9e0e9c88044df6">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Petter Risholm,Eigil Samset,pettri@ifi.uio.no</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0f7ccffd-5dcf-44a4-9971-18d0a82c6dd3</GUID>
        <Name>Radar Signal Processing with Graphics Processors (GPUs)</Name>
        <ShortDescription>The investigation is conducted through comparing a GPU (GTX260) against a modern desktop CPU for several HPEC (High Performance Embedded Computing) and other radar signal processing algorithms; 12 in total. Several other aspects are also investigated, such as programming environment and efficiency, future GPU-architectures, and applicability in radar systems. Our CUDA GPU-implementations perform substantially better than the CPU and associated CPU-code used for all but one of the 12 algorithms tested, sometimes by a factor of 100 or more. The OpenCL implementations also perform substantially better than the CPU. The substantial performance achieved when using CUDA for almost all benchmarks can be attributed to both the high theoretical performance of the GPU, but also to the inherent data-parallelism, and hence GPU-suitability, of almost all of the investigated algorithms. Programming CUDA is reasonably straight forward, largely due to the mature development environment and abundance of documentation and white-papers. OpenCL is a lot more tedious to program. Furthermore, the coming CUDA GPU-architecture called Fermi is expected to further increase performance and programmability. When considering system integration of GPU-architectures into harsh radar application environments, one should be aware of potential heat and also possible obsolescence issues.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/881_logo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/881_logo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>HPC</OrganizationName>
        <OrganizationURL>http://www.hpcsweden.se</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>02/08/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>140</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="jimmy.pettersson@hpcsweden.se">Ian Wainwright</Author>
           <Author email="ian.wainwright@gmail.com">Jimmy Pettersson</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.hpcsweden.se/solutions">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ian Wainwright,Jimmy Pettersson,jimmy.pettersson@hpcsweden.se,ian.wainwright@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4ff91bfb-b496-46cb-9e1c-3572031aff73</GUID>
        <Name>Exploiting the Power of GPUs for Asymmetric Cryptography</Name>
        <ShortDescription>Modern Graphics Processing Units (GPU) have reached a dimension with respect to performance and gate count exceeding conventional Central Processing Units (CPU) by far. Many modern computer systems include beside a CPU such a powerful GPU which runs idle most of the time and might be used as cheap and instantly available co-processor for general purpose applications. </ShortDescription>
        <URL>http://springerlink.com/content/d1rt1r0326500541/?p=0dd80c5c9b564b009c9e0e9c88044df6</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/880_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/880_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Ruhr University Bochum, Germany</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>06</ReleaseDay>
        <ReleaseDateDisplay>08/06/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="szerwinski@crypto.rub.de">Robert Szerwinski</Author>
           <Author email="gueneysu@crypto.rub.de">Tim Guneysu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/d1rt1r0326500541/?p=0dd80c5c9b564b009c9e0e9c88044df6">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Robert Szerwinski,Tim Guneysu,szerwinski@crypto.rub.de,gueneysu@crypto.rub.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8749f743-3122-4ade-9664-c7c12c9cba95</GUID>
        <Name>Programmable and Scalable Architecture for Graphics Processing Units</Name>
        <ShortDescription>Graphics processing is an application area with high level of parallelism at the data level and at the task level. Therefore, graphics processing units (GPU) are often implemented as multiprocessing systems with high performance floating point processing and application specific hardware stages for maximizing the graphics throughput. 
In this paper we evaluate the suitability of Transport Triggered Architectures (TTA) as a basis for implementing GPUs. TTA improves scalability over the traditional VLIW-style architectures making it interesting for computationally intensive applications. We show that TTA provides high floating point processing performance while allowing more programming freedom than vector processors. 
Finally, one of the main features of the presented TTA-based GPU design is its fully programmable architecture making it suitable target for general purpose computing on GPU APIs which have become popular in recent years.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/879_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/879_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universidad Rey Juan Carlos, Spain</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>07/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="carlos.delalama@urjc.es">Carlos S. de La Lama</Author>
           <Author email="pekka.jaaskelainen@tut.fi">Pekka Jaaskelainen</Author>
           <Author email="jarmo.takala@tut.fi">Jarmo Takala</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/e167v646360633q6/?p=0dd80c5c9b564b009c9e0e9c88044df6">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Carlos S. de La Lama,Pekka Jaaskelainen,Jarmo Takala,carlos.delalama@urjc.es,pekka.jaaskelainen@tut.fi,jarmo.takala@tut.fi</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>81a5b615-fc67-4667-ab39-ad855603e008</GUID>
        <Name>Breath-Hold Target Localization with Simultaneous Kilovoltage/Megavoltage Cone-Beam CT and Fast Reconstruction</Name>
        <ShortDescription>Hypofractionated high dose radiotherapy of small lung tumors is very effective and was based on stereotaxy until now. It has recently become possible to achieve a high patient positioning precision based on on-line imaging with cone-beam CT (CBCT) and breath-hold techniques. The CBCT acquisition time of roughly 60 seconds, however, is too long for one breath-hold, resulting in image degradation by respiratory motion artifacts. By using megavoltage (MV) an kilovoltage (kV) photon source (mounted perpendicularly on the Linac gantry) for volume reconstruction, we could reduce the acquisition time to 15 seconds.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/878_prediction_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/878_prediction_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>World Congress on Medical Physics and Biomedical Engineering, Germany</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>04</ReleaseDay>
        <ReleaseDateDisplay>01/04/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">M. Blessing</Author>
           <Author email="">D. Stsepankou</Author>
           <Author email="">H. Wertz</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/u564253u25144616/?p=0dd80c5c9b564b009c9e0e9c88044df6">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>M. Blessing,D. Stsepankou,H. Wertz</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b0ea1f26-c2cf-436b-830b-d3cbb7ecb7bf</GUID>
        <Name>Implementation of Fine-Grained Algorithms on Graphical Processing Unit</Name>
        <ShortDescription>In this paper we solve the problem of mapping of fine- grained algorithm to graphical processing unit (GPU). Synchronous, asynchronous, block-synchronous and probabilistic cellular automata and explicit scheme of PDE are used as examples. Different implementation variants and their performances are presented. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/877_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/877_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>ICMMG SB RAS, Novosibirsk, Russia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>09/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="kalgin@ssd.sscc.ru">Konstantin Kalgin</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/2871j8784t103723/?p=f6088520c34847d9879e56fe25ff7204">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Konstantin Kalgin,kalgin@ssd.sscc.ru</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ac91c067-09bc-44a8-877b-254db2f289b0</GUID>
        <Name>StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures</Name>
        <ShortDescription>In the field of HPC, the current hardware trend is to design multiprocessor architectures that feature heterogeneous technologies such as specialized coprocessors (e.g. Cell/BE SPUs) or data-parallel accelerators (e.g. GPGPUs). 
Approaching the theoretical performance of these architectures is a complex issue. Indeed, substantial efforts have already been devoted to efficiently offload parts of the computations. However, designing an execution model that unifies all computing units and associated embedded memory remains a main challenge. 
We have thus designed StarPU, an original runtime system providing a high-level, unified execution model tightly coupled with an expressive data management library. The main goal of StarPU is to provide numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware on the one hand, and easily develop and tune powerful scheduling algorithms on the other hand. 
We have developed several strategies that can be selected seamlessly at run time, and we have demonstrated their efficiency by analyzing the impact of those scheduling policies on several classical linear algebra algorithms that take advantage of multiple cores and GPUs at the same time. In addition to substantial improvements regarding execution times, we obtained consistent superlinear parallelism by actually exploiting the heterogeneous nature of the machine. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/876_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/876_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Bordeaux</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>08/22/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Cedric Augonnet</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/h013578235633mw3/?p=f6088520c34847d9879e56fe25ff7204">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Cedric Augonnet</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>71356391-beda-4ab3-82f5-a0b60765af0f</GUID>
        <Name>Seismic Wave Field Modeling with Graphics Processing Units</Name>
        <ShortDescription>GPGPU - general-purpose computing on graphics processing units is a very effective and inexpensive way of dealing with time consuming computations. In some cases even a low end GPU can be a dozens of times faster than a modern CPUs. Utilization of GPGPU technology can make a typical desktop computer powerful enough to perform necessary computations in a fast, effective and inexpensive way. Seismic wave field modeling is one of the problems of this kind. Some times one modeled common shot-point gather or one wave field snapshot can reveal the nature of an analyzed wave phenomenon. On the other hand these kinds of modelings are often a part of complex and extremely time consuming methods with almost unlimited needs of computational resources. This is always a problem for academic centers, especially now when times of generous support from oil and gas companies have ended</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/875_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/875_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>AGH University of Science and Technology, Poland</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>05/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="tdanek@agh.edu.pl">Tomasz Danek</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/v11t304605670k04/?p=f6088520c34847d9879e56fe25ff7204">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Tomasz Danek,tdanek@agh.edu.pl</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9c1ce1c7-2162-49d4-93ca-a21ba5687e48</GUID>
        <Name>Active Structured Learning for High-Speed Object Detection</Name>
        <ShortDescription>High-speed smooth and accurate visual tracking of objects in arbitrary, unstructured environments is essential for robotics and human motion analysis. However, building a system that can adapt to arbitrary objects and a wide range of lighting conditions is a challenging problem, especially if hard real-time constraints apply like in robotics scenarios. In this work, we introduce a method for learning a discriminative object tracking system based on the recent structured regression framework for object localization. Using a kernel function that allows fast evaluation on the GPU, the resulting system can process video streams at speed of 100 frames per second or more. 
Consecutive frames in high speed video sequences are typically very redundant, and for training an object detection system, it is sufficient to have training labels from only a subset of all images. We propose an active learning method that select training examples in a data-driven way, thereby minimizing the required number of training labeling. Experiments on realistic data show that the active learning is superior to previously used methods for dataset subsampling for this task. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/874_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/874_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Max Planck Institute for Biological Cybernetics, Tubingen, Germany</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>09/02/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ChristophH.Lampert@tuebingen.mpg.de">Christoph H. Lampert</Author>
           <Author email="Jan.Peters@tuebingen.mpg.de">Jan Peters</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/fm571t54h7l75231/?p=f6088520c34847d9879e56fe25ff7204">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Christoph H. Lampert,Jan Peters,ChristophH.Lampert@tuebingen.mpg.de,Jan.Peters@tuebingen.mpg.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7afe1a21-0c1b-40fd-9f4b-60e731b26240</GUID>
        <Name>Attaining High Performance in General-Purpose Computations on Current Graphics Processors </Name>
        <ShortDescription>The increase in performance of the last generations of graphics processors (GPUs) has made this class of hardware a coprocessing platform of remarkable success in certain types of operations. In this paper we evaluate the performance of linear algebra and image processing routines, both on classical and unified GPU architectures and traditional processors (CPUs). From this study, we gain insights on the properties that make an algorithm likely to deliver high performance on a GPU.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/873_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/873_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universidad Jaume, Spain</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>06</ReleaseDay>
        <ReleaseDateDisplay>12/06/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="figual@icc.uji.es">Francisco D. Igual</Author>
           <Author email="mayo@icc.uji.es">Rafael Mayo</Author>
           <Author email="quintana@icc.uji.es">Enrique S. Quintana-Orti</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/41686m1275376644/?p=f6088520c34847d9879e56fe25ff7204">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Francisco D. Igual,Rafael Mayo,Enrique S. Quintana-Orti,figual@icc.uji.es,mayo@icc.uji.es,quintana@icc.uji.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>6fe3ccf0-abfd-4113-bcf7-49d91f20f318</GUID>
        <Name>Efficient Multiplication of Polynomials on Graphics Hardware</Name>
        <ShortDescription>We present the algorithm to multiply univariate polynomials with integer coefficients efficiently using the Number Theoretic transform (NTT) on Graphics Processing Units (GPU). The same approach can be used to multiply large integers encoded as polynomials. Our algorithm exploits fused multiply-add capabilities of the graphics hardware. NTT multiplications are executed in parallel for a set of distinct primes followed by reconstruction using the Chinese Remainder theorem (CRT) on the GPU. Our benchmarking experiences show the NTT multiplication performance up to 77 GMul/s. We compared our approach with CPU-based implementations of polynomial and large integer multiplication provided by NTL and GMP libraries.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/872_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/872_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Saarbrucken</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>08/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="asm@mpi-inf.mpg.de">Pavel Emeliyanenko</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/j68860730n881582/?p=f6088520c34847d9879e56fe25ff7204">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Pavel Emeliyanenko,asm@mpi-inf.mpg.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5f4ef7a1-9d5f-4ef3-a1ea-c2be5ed4d1e8</GUID>
        <Name>Parallel LDPC Decoding on GPUs Using a Stream-Based Computing Approach</Name>
        <ShortDescription>Low-Density Parity-Check (LDPC) codes are powerful error correcting codes adopted by recent communication standards. LDPC decoders are based on belief propagation algorithms, which make use of a Tanner graph and very intensive message-passing computation, and usually require hardware-based dedicated solutions. With the exponential increase of the computational power of commodity graphics processing units (GPUs), new opportunities have arisen to develop general purpose processing on GPUs. This paper proposes the use of GPUs for implementing flexible and programmable LDPC decoders. A new stream-based approach is proposed, based on compact data structures to represent the Tanner graph. It is shown that such a challenging application for stream-based computing, because of irregular memory access patterns, memory bandwidth and recursive flow control constraints, can be efficiently implemented on GPUs. The proposal was experimentally evaluated by programming LDPC decoders on GPUs using the Caravela platform, a generic interface tool for managing the kernels' execution regardless of the GPU manufacturer and operating system. Moreover, to relatively assess the obtained results, we have also implemented LDPC decoders on general purpose processors with Streaming Single Instruction Multiple Data (SIMD) Extensions. Experimental results show that the solution proposed here efficiently decodes several codewords simultaneously, reducing the processing time by one order of magnitude.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/871_cover-medium4_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/871_cover-medium4_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universidade de Coimbra</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>09/28/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="gff@co.it.pt">Gabriel Falcao</Author>
           <Author email="yama@inesc-id.pt">Shinichi Yamagiwa</Author>
           <Author email="vitor@co.it.pt">Vitor Silva</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/j6838p2588754610/?p=3fe5eb5f25ba46d49a7421bc09992dac">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Gabriel Falcao,Shinichi Yamagiwa,Vitor Silva,gff@co.it.pt,yama@inesc-id.pt,vitor@co.it.pt</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>394dbac7-73bd-4c9e-86da-1dd81c35ad28</GUID>
        <Name>Retargeting PLAPACK to Clusters with Hardware Accelerators</Name>
        <ShortDescription>Hardware accelerators are becoming a highly appealing approach to boost the raw performance as well as the price-performance and power-performance ratios of current clusters. In this paper we present a strategy to retarget PLAPACK, a library initially designed for clusters of nodes equipped with general- purpose processors and a single address space per node, to clusters equipped with graphics processors (GPUs). In our approach data are kept in the device memory and only retrieved to main memory when they have to be communicated to a different node. Here we benefit from the object-based orientation of PLAPACK which allows all communication between host and device to be embedded within a pair of routines, providing a clean abstraction that enables an efficient and direct port of all the contents of the library. Our experiments in a cluster consisting of 16 nodes with two NVIDIA Quadro FX5800 GPUs each show the performance of our approach.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/870_FLAMEbanner_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/870_FLAMEbanner_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University Jaume I / Texas University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>02/11/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="figual@icc.uji.es">Fogue</Author>
           <Author email="">Igual</Author>
           <Author email="">Quintana-Orti</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cs.utexas.edu/users/flame/pubs/FLAWN42.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Fogue,Igual,Quintana-Orti,figual@icc.uji.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d4831d82-4e49-46b5-9751-c1e58a61d67a</GUID>
        <Name>Neural Network Training with Extended Kalman Filter Using Graphics Processing Unit</Name>
        <ShortDescription>The graphics processing unit has evolved through the years into the powerful resource for general purpose computing. We present in this article the implementation of Extended Kalman filter used for recurrent neural networks training, which most computational intensive tasks are performed on the GPU. This approach achieves significant speedup of neural network training process for larger networks.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/869_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/869_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Slovak University of Technology in Bratislava</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>08/29/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="trebaticky@fiit.stuba.sk">Peter Trebaticky</Author>
           <Author email="pospichal@fiit.stuba.sk">Jiri Pospichal</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/h13l4200749426p5/?p=3fe5eb5f25ba46d49a7421bc09992dac">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Peter Trebaticky,Jiri Pospichal,trebaticky@fiit.stuba.sk,pospichal@fiit.stuba.sk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c8daa779-2c65-4b59-a45a-a3648753fb56</GUID>
        <Name>Fast collision detection using the A-buffer</Name>
        <ShortDescription>This paper presents a novel and fast image-space collision detection algorithm with the A-buffer, where the GPU computes the potentially colliding sets (PCSs), and the CPU performs the standard triangle intersection test. When the bounding boxes of two objects intersect, the intersection is passed to the GPU. The object surfaces in the intersection are rendered into the A-buffer. Rendering into the A-buffer is up to eight-times faster than the ordinary approaches. Then, PCSs are computed by comparing the depth values of each texel of the A-buffer. A PCS consists of only two triangles. The PCSs are read back to the CPU, and the CPU computes the intersection points between the triangles. The proposed algorithm runs extremely fast, does not require any preprocessing, can handle dynamic objects including deformable and fracturing models, and can compute self-collisions. Such versatility and performance gain of the proposed algorithm prove its usefulness in real-time applications such as 3D games.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/868_visualcomputer_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/868_visualcomputer_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Korea University, Seoul, Korea</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>05/17/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="jhan@korea.ac.kr">Hanyoung Jang</Author>
           <Author email="">JungHyun Han</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/9667531541773560/?p=3fe5eb5f25ba46d49a7421bc09992dac">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Hanyoung Jang,JungHyun Han,jhan@korea.ac.kr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3752cd56-fe2a-4457-a8e1-ea665d83102d</GUID>
        <Name>Engineering of Computer Vision Algorithms Using Evolutionary Algorithms</Name>
        <ShortDescription>Computer vision algorithms are currently developed by looking up the available operators from the literature and then arranging those operators such that the desired task is performed. This is often a tedious process which also involves testing the algorithm with different lighting conditions or at different sites. We have developed a system for the automatic generation of computer vision algorithms at interactive frame rates using GPU accelerated image processing. The user simply tells the system which object should be detected in an image sequence. Simulated evolution, in particular Genetic Programming, is used to automatically generate and test alternative computer vision algorithms. Only the best algorithms survive and eventually provide a solution to the users image processing task.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/867_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/867_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Eberhard Karls Universitat Tubingen</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>09/30/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="marc.ebner@wsii.uni-tuebingen.de">Marc Ebner</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/8557r69773776652/?p=3fe5eb5f25ba46d49a7421bc09992dac">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Marc Ebner,marc.ebner@wsii.uni-tuebingen.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e2835997-b2cc-4236-a3c2-83316b6befcb</GUID>
        <Name>Solving Dense Linear Systems on Graphics Processors</Name>
        <ShortDescription>We present several algorithms to compute the solution of a linear system of equations on a GPU, as well as general techniques to improve their performance, such as padding and hybrid GPU-CPU computation. We also show how iterative refinement with mixed-precision can be used to regain full accuracy in the solution of linear systems. Experimental results on a G80 using CUBLAS 1.0, the implementation of BLAS for NVIDIA GPUs with unified architecture, illustrate the performance of the different algorithms and techniques proposed.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/866_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/866_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universidad Jaume</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>08/21/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="barrachi@icc.uji.es">Sergio Barrachina</Author>
           <Author email="castillo@icc.uji.es">Maribel Castillo</Author>
           <Author email="figual@icc.uji.es">Francisco D. Igual</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/53t12620727x6512/?p=3fe5eb5f25ba46d49a7421bc09992dac">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sergio Barrachina,Maribel Castillo,Francisco D. Igual,barrachi@icc.uji.es,castillo@icc.uji.es,figual@icc.uji.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ee93a2d7-a172-4d28-88cc-ea3581de0988</GUID>
        <Name>Visual simulation of thermal fluid dynamics in a pressurized water reactor</Name>
        <ShortDescription>We present a simulation and visualization system for a critical application analysis of the thermal fluid dynamics inside a pressurized water reactor of a nuclear power plant when cold water is injected into the reactor vessel. We employ a hybrid thermal lattice Boltzmann method (HTLBM), which has the advantages of ease of parallelization and ease of handling complex simulation boundaries. For efficient computation and storage of the irregular-shaped simulation domain, we classify the domain into nonempty and empty cells and apply a novel packing technique to organize the nonempty cells. This method is implemented on a GPU cluster for acceleration. We demonstrate the formation of cold-water plumes in the reactor vessel. A set of interactive visualization tools, such as side-view slices, 3D volume rendering, thermal layers rendering, and panorama rendering, are provided to collectively visualize the structure and dynamics of the temperature field in the vessel. To the best of our knowledge, this is the first system that combines 3D simulation and visualization for analyzing thermal shock risk in a pressurized water reactor.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/865_visualcomputer_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/865_visualcomputer_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Stony Brook University, NY</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>23</ReleaseDay>
        <ReleaseDateDisplay>01/23/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="fzhe@cs.sunysb.edu">Zhe Fan</Author>
           <Author email="yukuo@cs.sunysb.edu">Yu-Chuan Kuo</Author>
           <Author email="zhao@cs.kent.edu">Ye Zhao</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/n34k26h774q4w7j7/?p=3fe5eb5f25ba46d49a7421bc09992dac">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Zhe Fan,Yu-Chuan Kuo,Ye Zhao,fzhe@cs.sunysb.edu,yukuo@cs.sunysb.edu,zhao@cs.kent.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>bf51755a-de3f-4ba1-b775-ba5134f861e9</GUID>
        <Name>A novel multiple-walk parallel algorithm for the BarnesHut treecode on GPUs towards cost effective, high performance N-body simulation</Name>
        <ShortDescription>Recently, general-purpose computation on graphics processing units (GPGPU) has become an increasingly popular field of study as graphics processing units (GPUs) continue to be proposed as high performance and relatively low cost implementation platforms for scientific computing applications. Among these applications figure astrophysical N-bodysimulations, which form one of the most challenging problems in computational science. However, in most reported studies, a simple  algorithm was used for GPGPUs, and the resulting performances were not observed to be better than those of conventional CPUs that were based on more optimized  algorithms such as the tree algorithm or the particle-particle particle-mesh algorithm. Because of the difficulty in getting efficient implementations of such algorithms on GPUs, a GPU cluster had no practical advantage over general-purpose PC clusters for N-bodysimulations. In this paper, we report a new method for efficient parallel implementation of the tree algorithm on GPUs. Our novel tree code allows the realization of an N-bodysimulation on a GPU cluster at a much higher performance than that on general PC clusters. We practically performed a cosmological simulation with 562 million particles on a GPU cluster using 128 NVIDIA GeForce 8800GTS GPUs at an overall cost of 168172 $. We obtained a sustained performance of 20.1 Tflops, which when normalized against a general-purpose CPU implementation leads to a performance of 8.50 Tflops. The achieved cost/performance was hence a mere $19.8 /Gflops which shows the high competitiveness of GPGPUs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/864_implementation_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/864_implementation_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nagasaki University, Japan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="hamada@cis.nagasaki-u.ac.jp">Tsuyoshi Hamada</Author>
           <Author email="nitadori@cfca.jp">Keigo Nitadori</Author>
           <Author email="k.benkdird@ed.ac.uk">Khaled Benkrid</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/j288l042547v4403/?p=3fe5eb5f25ba46d49a7421bc09992dac">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Tsuyoshi Hamada,Keigo Nitadori,Khaled Benkrid,hamada@cis.nagasaki-u.ac.jp,nitadori@cfca.jp,k.benkdird@ed.ac.uk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4977b07b-89ac-439e-abb4-8879e099c3c4</GUID>
        <Name>Efficient Acceleration of Asymmetric Cryptography on Graphics Hardware</Name>
        <ShortDescription>Graphics processing units (GPU) are increasingly being used for general purpose computing. We present implementations of large integer modular exponentiation, the core of public-key cryptosystems such as RSA, on a DirectX 10 compliant GPU. DirectX 10 compliant graphics processors are the latest generation of GPU architecture, which provide increased programming flexibility and support for integer operations. We present high performance modular exponentiation implementations based on integers represented in both standard radix form and residue number system form. We show how a GPU implementation of a 1024-bit RSA decrypt primitive can outperform a comparable CPU implementation by up to 4 times and also improve the performance of previous GPU implementations by decreasing latency by up to 7 times and doubling throughput. We present how an adaptive approach to modular exponentiation involving implementations based on both a radix and a residue number system gives the best all-around performance on the GPU both in terms of latency and throughput. We also highlight the usage criteria necessary to allow the GPU to reach peak performance on public key cryptographic operations.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/863_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/863_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Trinity College Dublin</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>06/19/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="harrisoo@cs.tcd.ie">Owen Harrison</Author>
           <Author email="john.waldron@cs.tcd.ie">John Waldron</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/v83j50l12p7446v2/?p=97a9e2cb818a4dca8ae79455197c0bb7">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Owen Harrison,John Waldron,harrisoo@cs.tcd.ie,john.waldron@cs.tcd.ie</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ed7c674a-46f4-4536-84a9-24e1489c692e</GUID>
        <Name>Realistic real-time sound re-synthesis and processing for interactive virtual worlds</Name>
        <ShortDescription>We present new GPU-based techniques for implementing linear digital filters for real-time audio processing. Our solution for recursive filters is the first presented in the literature. We demonstrate the relevance of these algorithms to computer graphics by synthesizing realistic sounds of colliding objects made of different materials, such as glass, plastic, and wood, in real time. The synthesized sounds can be parameterized by the object materials, velocities, and collision angles. Despite its flexibility, our approach uses very little memory, since it essentially requires a set of coefficients representing the impulse response of each material sound. Such features make our approach an attractive alternative to traditional CPU-based techniques that use playback of pre-recorded sounds. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/862_visualcomputer_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/862_visualcomputer_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Instituto de Informatica </OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>03/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ftrebien@inf.ufrgs.br">Fernando Trebien</Author>
           <Author email="oliveira@inf.ufrgs.br">Manuel M. Oliveira</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/u782025745220128/?p=97a9e2cb818a4dca8ae79455197c0bb7">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Fernando Trebien,Manuel M. Oliveira,ftrebien@inf.ufrgs.br,oliveira@inf.ufrgs.br</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b3df15ed-da9f-40e2-9e52-827b4ffa8012</GUID>
        <Name>Solid Mesh Registration for Radiotherapy Treatment Planning</Name>
        <ShortDescription>We present an algorithm for solid organ registration of pre-segmented data represented as tetrahedral meshes. Registration of the organ surface is driven by force terms based on a distance field representation of the source and reference shapes. Registration of internal morphology is achieved using a non-linear elastic finite element model. A key feature of the method is that the user does not need to specify boundary conditions (surface point correspondences) prior to the finite element analysis. Instead the boundary matches are found as an integrated part of the analysis. The method is evaluated on phantom data and prostate data obtained in vivo based on fiducial marker accuracy and inverse consistency of transformations. The parallel nature of the method allows an efficient implementation on a GPU and as a result the method is very fast. All validation registrations take less than 30 seconds to complete. The proposed method has many potential uses in image guided radiotherapy (IGRT) which relies on registration to account for organ deformation between treatment sessions.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/861_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/861_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Aarhus University, Denmark</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>01/21/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="noe@cs.au.dk">Karsten Ostergaard Noe</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/q266118020561t3r/?p=97a9e2cb818a4dca8ae79455197c0bb7">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Karsten Ostergaard Noe,noe@cs.au.dk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>6a0fe015-97bb-4d85-b408-308d30e105d5</GUID>
        <Name>Large Scale Bioinformatics Data Mining with Parallel Genetic Programming on Graphics Processing Units</Name>
        <ShortDescription>A suitable single instruction multiple data GP interpreter can achieve high (Giga GPop/second) performance on a SIMD GPU graphics card by simultaneously running multiple diverse members of the genetic programming population. SPMD dataflow parallelisation is achieved because the single interpreter treats the different GP programs as data. On a single 128 node parallel nVidia GeForce 8800 GTX GPU, the interpreter can out run a compiled approach, where data parallelisation comes only by running a single program at a time across multiple inputs. 
The RapidMind GPGPU Linux C++ system has been demonstrated by predicting ten year+ outcome of breast cancer from a dataset containing a million inputs. NCBI GEO GSE3494 contains hundreds of Affymetrix HG-U133A and HG-U133B GeneChip biopsies. Multiple GP runs each with a population of five million programs winnow useful variables from the chaff at more than 500 million GPops per second. Sources available via FTP. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/860_iss_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/860_iss_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>King's College, London</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>06</ReleaseDay>
        <ReleaseDateDisplay>01/06/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">William B. Langdon</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/8k366573425847n1/?p=97a9e2cb818a4dca8ae79455197c0bb7">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>William B. Langdon</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8a53a58c-cb1e-4854-b81a-88ec94b5490d</GUID>
        <Name>Hierarchical Markov Random Fields Applied to Model Soft Tissue Deformations on Graphics Hardware</Name>
        <ShortDescription>Many methodologies dealing with prediction or simulation of soft tissue deformations on medical image data require preprocessing of the data in order to produce a different shape representation that complies with standard methodologies, such as mass spring networks, finite element method s (FEM). On the other hand, methodologies working directly on the image space normally do not take into account mechanical behavior of tissues and tend to lack physics foundations driving soft tissue deformations. This chapter presents a method to simulate soft tissue deformations based on coupled concepts from image analysis and mechanics theory. The proposed methodology is based on a robust stochastic approach that takes into account material properties retrieved directly from the image, concepts from continuum mechanics and FEM. The optimization framework is solved within a hierarchical Markov random field (HMRF) which is implemented on the graphics processor unit (GPU ). 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/859_cover-medium3_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/859_cover-medium3_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Bern, Switzerland</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>11/24/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="christof.seiler@artorg.unibe.ch">Christof Seiler</Author>
           <Author email="christof.seiler@artorg.unibe.ch">Philippe Buchler</Author>
           <Author email="christof.seiler@artorg.unibe.ch">Lutz-Peter Nolte</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/p30h5w66415p4303/?p=97a9e2cb818a4dca8ae79455197c0bb7">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Christof Seiler,Philippe Buchler,Lutz-Peter Nolte,christof.seiler@artorg.unibe.ch,christof.seiler@artorg.unibe.ch,christof.seiler@artorg.unibe.ch</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8fea4552-0ab0-4c43-a0e4-f72c417d9e06</GUID>
        <Name>Efficient K- Means Clustering Using Accelerated Graphics Processors </Name>
        <ShortDescription>We exploit the parallel architecture of the Graphics Processing Unit (GPU) used in desktops to efficiently implement the traditional K-means algorithm. Our approach in clustering avoids the need for data and cluster information transfer between the GPU and CPU in between the iterations. In this paper we present the novelties in our approach and techniques employed to represent data, compute distances, centroids and identify the cluster elements using the GPU. We measure performance using the metric: computational time per iteration. Our implementation of k-means clustering on an Nvidia 5900 graphics processor is 4 to 12 times faster than the CPU and 7 to 22 times faster on the Nvidia 8500 graphics processor for various data sizes. We also achieved 12 to 64 times speed gain on the 5900 and 20 to 140 times speed gains on the 8500 graphics processor in computational time per iteration for evaluations with various cluster sizes.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/855_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/855_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nanyang Technological University, Singapore</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>08/30/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="sall0001@ntu.edu.sg">S. A. Arul Shalom</Author>
           <Author email="asmdash@ntu.edu.sg">Manoranjan Dash</Author>
           <Author email="h0630082@nus.edu.sg">Minh Tue</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/t264x8286727410r/?p=97a9e2cb818a4dca8ae79455197c0bb7">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>S. A. Arul Shalom,Manoranjan Dash,Minh Tue,sall0001@ntu.edu.sg,asmdash@ntu.edu.sg,h0630082@nus.edu.sg</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>07072ad0-b7c1-4eb6-9826-2c1cc0ae740f</GUID>
        <Name>Systematic Parallelization of Medical Image Reconstruction for Graphics Hardware </Name>
        <ShortDescription>Modern Graphics Processing Units (GPUs) consist of several SIMD-processors and thus provide a high degree of parallelism at low cost. We introduce a new approach to systematically develop parallel image reconstruction algorithms for GPUs from their parallel equivalents for distributed-memory machines. We use High-Level Petri Nets (HLPN) to intuitively describe the parallel implementations for distributed- memory machines. By denoting the functions of the HLPN with memory requirements and information about data distribution, we are able to identify parallel functions that can be implemented efficiently on the GPU. For an important iterative medical image reconstruction algorithm the list-mode OSEM algorithm we demonstrate the limitations of its distributed-memory implementation and show how our HLPN-based approach leads to a fast implementation on GPUs, reusable across different medical imaging devices. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/854_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/854_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Munster, Germany</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>08/21/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="schellmann@uni-muenster.de">Maraike Schellmann</Author>
           <Author email="voerding@uni-muenster.de">Jurgen Vording</Author>
           <Author email="gorlatch@uni-muenster.de">Sergei Gorlatch</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/m5831806472l5123/?p=97a9e2cb818a4dca8ae79455197c0bb7">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Maraike Schellmann,Jurgen Vording,Sergei Gorlatch,schellmann@uni-muenster.de,voerding@uni-muenster.de,gorlatch@uni-muenster.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7794a349-0166-46c9-8c6d-32da8b4febda</GUID>
        <Name>Real-Time Autostereoscopic Visualization of Registration-Generated 4D MR Image of Beating Heart</Name>
        <ShortDescription>This paper presents a real-time autostereoscopic visualization system using the principle of Integral Videography(IV). We develop MIP and composite volume ray casting method for IV volume rendering, and implemented the algorithm on GPU to achieve real-time rendering. The system was used to visualize 4D MR image that was generated from registration of 3D MR image and 4D ultrasound image. The registration scheme consists of inter-modality rigid registration between 3D MR image and 3D ultrasound image and intra-modality non-rigid registration between 3D ultrasound images. Registration processes were also implemented on GPU. Evaluation of processing speed showed that GPU processing time was 48x, 13x, 21x faster than CPU processing time for IV volume rendering, rigid registration, and non-rigid registration respectively. We also enabled real-time user interactivity for IV visualization system. In the future, We plan to use this system to develop intra-operative surgery navigation system for intra-cardiac surgery on beating heart. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/853_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/853_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>The University of Tokyo, Japan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>07/15/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="nicholas@atre.t.u-tokyo.ac.jp">Nicholas Herlambang</Author>
           <Author email="liao@atre.t.u-tokyo.ac.jp">Hongen Liao</Author>
           <Author email="masa@i.u-tokyo.ac.jp">Ken Masamune</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/370716jm24355u32/?p=97a9e2cb818a4dca8ae79455197c0bb7">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Nicholas Herlambang,Hongen Liao,Ken Masamune,nicholas@atre.t.u-tokyo.ac.jp,liao@atre.t.u-tokyo.ac.jp,masa@i.u-tokyo.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>498009d7-ccca-476d-881a-4a392b52b7ba</GUID>
        <Name>Multiscale and local search methods for real time region tracking with particle filters: local search driven by adaptive scale estimation on GPUs</Name>
        <ShortDescription>Tracking systems are important in computervision, with applications in surveillance, human computer interaction, etc. Consumer graphics processing units (GPUs) have experienced an extraordinary evolution in both computing performance and programmability, leading to greater use of the GPU for non-rendering applications. In this work we propose a real-time object tracking algorithm, based on the hybridization of particle filtering (PF) and a multi-scale local search (MSLS) algorithm, presented for both CPU and GPU architectures. The developed system provides successful results in precise tracking of single and multiple targets in monocular video, operating in real-time at 70 frames per second for 640 x 480 video resolutions on the GPU, up to 1,100% faster than the CPU version of the algorithm.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/852_implementation_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/852_implementation_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universidad Rey Juan Carlos, Spain</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>05/08/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="raul.cabido@urjc.es">Raul Cabido</Author>
           <Author email="antonio.sanz@urjc.es">Antonio S. Montemayor</Author>
           <Author email="juanjose.pantrigo@urjc.es"> Juan Jose Pantrigo </Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/63h4u82343287743/?p=97a9e2cb818a4dca8ae79455197c0bb7">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Raul Cabido,Antonio S. Montemayor, Juan Jose Pantrigo ,raul.cabido@urjc.es,antonio.sanz@urjc.es,juanjose.pantrigo@urjc.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>066c8093-375d-46dc-a170-4955e4c07315</GUID>
        <Name>Deforming a High-Resolution Mesh in Real-Time by Mapping onto a Low-Resolution Physical Model</Name>
        <ShortDescription>For interactive surgical simulation the physical model of the soft tissue needs to be solved in real-time. This limits the attainable model density to well below the desired mesh density for visual realism. Previous work avoids this problem by using a high-resolution visual mesh mapped onto a low-resolution physical model. We apply the same approach and present an computationally cheap implementation of a known algorithm to avoid texture artefacts caused by the mapping. We also introduce a spline-based algorithm to prevent groups of high-resolution vertices, mapped to the same low-resolution triangle, from exhibiting movements in which the underlying low-resolution structure can be recognised. The resulting mapping algorithm is very efficient, mapping 54,000 vertices in 8.5 ms on the CPU and in 0.88 ms on the GPU. Consequently, the density of the high-resolution visual mesh is limited only by the detail of the CT data from which the mesh was generated. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/850_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/850_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>The Australian e-Health Research Centre</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>07/07/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Hans de Visser</Author>
           <Author email="">Olivier Comas</Author>
           <Author email="">David Conlan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/q740537h2368g436/?p=edbe44e5c45942969e69b6ab158a86b2">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Hans de Visser,Olivier Comas,David Conlan</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3423ec81-fdfb-4e8c-89af-2dce4ce05a4a</GUID>
        <Name>ECM on Graphics Cards </Name>
        <ShortDescription>This paper reports record-setting performance for the elliptic-curve method of integer factorization: for example, 926.11 curves/second for ECM stage 1 with B 1&#8201;=&#8201;8192 for 280-bit integers on a single PC. The state-of-the-art GMP-ECM software handles 124.71 curves/second for ECM stage 1 with B 1&#8201;=&#8201;8192 for 280-bit integers using all four cores of a 2.4 GHz Core 2 Quad Q6600. 
The extra speed takes advantage of extra hardware, specifically two NVIDIA GTX 295 graphics cards, using a new ECM implementation introduced in this paper. Our implementation uses Edwards curves, relies on new parallel addition formulas, and is carefully tuned for the highly parallel GPU architecture. On a single GTX 295 the implementation performs 41.88 million modular multiplications per second for a general 280-bit modulus. GMP-ECM, using all four cores of a Q6600, performs 13.03 million modular multiplications per second. 
This paper also reports speeds on other graphics processors: for example, 2414 280-bit elliptic-curve scalar multiplications per second on an older NVIDIA 8800 GTS (G80), again for a general 280-bit modulus. For comparison, the CHES 2008 paper "Exploiting the Power of GPUs for Asymmetric Cryptography" reported 1412 elliptic-curve scalar multiplications per second on the same graphics processor despite having fewer bits in the scalar (224 instead of 280), fewer bits in the modulus (224 instead of 280), and a special modulus (2224&#8201;&#8722;&#8201;296&#8201;+&#8201;1). 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/849_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/849_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Illinois at Chicago</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>04/16/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="djb@cr.yp.to">Daniel J. Bernstein</Author>
           <Author email="trchen1033@crypto.tw">Tien-Ren Chen</Author>
           <Author email="doug@crypto.tw">Chen-Mou Cheng</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/5554012086278702/?p=edbe44e5c45942969e69b6ab158a86b2">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Daniel J. Bernstein,Tien-Ren Chen,Chen-Mou Cheng,djb@cr.yp.to,trchen1033@crypto.tw,doug@crypto.tw</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9dd9b7a3-2ab7-4a26-ad08-1f2554a989fe</GUID>
        <Name>A Practical Approach of Curved Ray Prestack Kirchhoff Time Migration on GPGPU</Name>
        <ShortDescription>We introduced four prototypes of General Purpose GPU solutions by Compute Unified Device Architecture (CUDA) on NVidia GeForce 8800GT and Tesla C870 for a practical Curved Ray Prestack Kirchhoff Time Migration program, which is one of the most widely adopted imaging methods in the seismic data processing industry. We presented how to re-design and re-implement the original CPU code to efficient GPU code step by step. We demonstrated optimization methods, such as how to reduce the overhead of memory transportation on PCI-E bus, how to significantly increase the kernel thread numbers on GPU cores, how to buffer the inputs and outputs of CUDA kernel modules, and how to utilize the memory streams to overlap GPU kernel execution time, etc., to improve the runtime performance on GPUs. We analyzed the floating point errors between CPUs and GPUs. We presented the images generated by CPU and GPU programs for the same real-world seismic data inputs. Our final approach of Prototype-IV on NVidia GeForce 8800GT is 16.3 times faster than its CPU version on Intels P4 3.0G.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/848_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/848_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Beihang University, Beijing</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>08/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="xhshi@buaa.edu.cn">Xiaohua Shi</Author>
           <Author email="whlichuang@126.com">Chuang Li</Author>
           <Author email="xu.wang@sei.buaa.edu.cn">Xu Wang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/ttjj3726610113j6/?p=edbe44e5c45942969e69b6ab158a86b2">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Xiaohua Shi,Chuang Li,Xu Wang,xhshi@buaa.edu.cn,whlichuang@126.com,xu.wang@sei.buaa.edu.cn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>6a921e34-8d47-4a42-a892-8398ec64468f</GUID>
        <Name>A Practical Quicksort Algorithm for Graphics Processors</Name>
        <ShortDescription>In this paper we present GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multi-core graphics processors. Quicksort has previously been considered as an inefficient sorting solution for graphics processors, but we show that GPU-Quicksort often performs better than the fastest known sorting implementations for graphics processors, such as radix and bitonic sort. Quicksort can thus be seen as a viable alternative for sorting large quantities of data on graphics processors</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/847_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/847_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Chalmers University of Technology, Sweden</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>09/20/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="cederman@chalmers.se">Daniel Cederman</Author>
           <Author email="tsigas@chalmers.se">Philippas Tsigas</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/f71007213lx75gh0/?p=edbe44e5c45942969e69b6ab158a86b2">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Daniel Cederman,Philippas Tsigas,cederman@chalmers.se,tsigas@chalmers.se</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>77df8f3f-2d01-428d-b8b3-70fc6d308873</GUID>
        <Name>Hardware-Accelerated Particle-Based Volume Rendering for Multiple Irregular Volumes </Name>
        <ShortDescription>In this paper, we propose a performance improvement of particle-based volume rendering (PBVR) by using a current, programmable GPU architecture. PBVR allows to render without visibility sorting by representing a given volume dataset as a set of opaque and emissive particles. In our new GPU acceleration of PBVR, we provide a switchable rendering pipeline that is compatible with both regular and irregular grid volumes. Particle generation is improved by using a cell-by-cell approach for processing large volume dataset. We also reduce the memory cost required for storing all sub-pixel values by proposing a pixel-superimposing technique targeting a large sub-pixel level. Our work demonstrates a full detail rendering rate from 5 to 11 fps for overlapped or separated multi-irregular volumes with a mega-scale number of volume cells on NVIDIA Geforce 8800GTX.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/846_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/846_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Center for the Promotion of Excellence in Higher Education, Kyoto University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>12/03/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Naohisa Sakamoto</Author>
           <Author email="">Ding Zhongming</Author>
           <Author email="">Takuma Kawamura</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/42x0g167u276r07l/?p=edbe44e5c45942969e69b6ab158a86b2">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Naohisa Sakamoto,Ding Zhongming,Takuma Kawamura</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8ad6f9a1-f957-4d15-88f2-662798637c89</GUID>
        <Name>Gramm-software package for molecular dynamics on graphical processing units </Name>
        <ShortDescription>This work describes the software package and algorithms for molecular dynamics using NVIDIA GPU G80, G84, and G92. All potentials needed for MM2 and AMBER force fields are implemented and the combination of different potentials is allowed. The performance comparison of different MD algorithms on GPU and CPU is presented. All software is available from www.gpamm.mntech.ru.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/845_cover-medium2_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/845_cover-medium2_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Russian Academy of Sciences</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>01/21/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">D. S. Tarasov</Author>
           <Author email="">E. D. Izotova</Author>
           <Author email="">D. A. Alisheva</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/98624210w828r30g/?p=edbe44e5c45942969e69b6ab158a86b2">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>D. S. Tarasov,E. D. Izotova,D. A. Alisheva</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>acb94169-1150-437c-9ebb-6c183be2b38f</GUID>
        <Name>Compiler support for general-purpose computation on GPUs</Name>
        <ShortDescription>In recent years, the GPU (graphics processing unit) has evolved into an extremely powerful and flexible processor, with it now representing an attractive platform for general-purpose computation. Moreover, changes to the design and programmability of GPUs provide the opportunity to perform general-purpose computation on a GPU (GPGPU). Even though many programming languages, software tools, and libraries have been proposed to facilitate GPGPU programming, the unusual and specific programming model of the GPU remains a significant barrier to writing GPGPU programs. In this paper, we introduce a novel compiler-based approach for GPGPU programming. Compiler directives are used to label code fragments that are to be executed on the GPU. Our GPGPU compiler, Guru, converts the labeled code fragments into ISO-compliant C code that contains appropriate OpenGL and Cg APIs. A native C compiler can then be used to compile it into the executable code for GPU. Our compiler is implemented based on the Open64 compiler infrastructure. Preliminary experimental results from selected benchmarks show that our compiler produces significant performance improvements for programs that exhibit a high degree of data parallelism. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/844_neville_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/844_neville_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>National Chung Cheng University, China</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>11/19/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="lyt94@cs.ccu.edu.tw">Yu-Te Lin</Author>
           <Author email="pschen@cs.ccu.edu.tw">Peng-Sheng Chen</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/mu72t2381w660525/?p=60db4e60c7714e7087d810ae6a83dbf5">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yu-Te Lin,Peng-Sheng Chen,lyt94@cs.ccu.edu.tw,pschen@cs.ccu.edu.tw</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a0c19949-34b1-4227-a049-a70feb8ad4e9</GUID>
        <Name>A Gradient Descent Approximation for Graph Cuts </Name>
        <ShortDescription>Graph cuts have become very popular in many areas of computer vision including segmentation, energy minimization, and 3D reconstruction. Their ability to find optimal results efficiently and the convenience of usage are some of the factors of this popularity. However, there are a few issues with graph cuts, such as inherent sequential nature of popular algorithms and the memory bloat in large scale problems. In this paper, we introduce a novel method for the approximation of the graph cut optimization by posing the problem as a gradient descent formulation. The advantages of our method is the ability to work efficiently on large problems and the possibility of convenient implementation on parallel architectures such as inexpensive Graphics Processing Units (GPUs). We have implemented the proposed method on the Nvidia 8800GTS GPU. The classical segmentation experiments on static images and video data showed the effectiveness of our method. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/843_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/843_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Gebze Institute of Technology, Gebze</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>09/02/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="yildiz@bilmuh.gyte.edu.tr">Alparslan Yildiz</Author>
           <Author email="akgul@bilmuh.gyte.edu.tr">Yusuf Sinan Akgul</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/38142r4688473177/?p=60db4e60c7714e7087d810ae6a83dbf5">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alparslan Yildiz,Yusuf Sinan Akgul,yildiz@bilmuh.gyte.edu.tr,akgul@bilmuh.gyte.edu.tr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>83479fc3-a87e-49c9-b505-d08ae7a1747f</GUID>
        <Name>Data Mining Using Graphics Processing Units </Name>
        <ShortDescription>During the last few years, Graphics Processing Units (GPU) have evolved from simple devices for the display signal preparation into powerful coprocessors that do not only support typical computer graphics tasks such as rendering of 3D scenarios but can also be used for general numeric and symbolic computation tasks such as simulation and optimization. As major advantage, GPUs provide extremely high parallelism (with several hundred simple programmable processors) combined with a high bandwidth in memory transfer at low cost. In this paper, we propose several algorithms for computationally expensive data mining tasks like similarity search and clustering which are designed for the highly parallel environment of a GPU. We define a multidimensional index structure which is particularly suited to support similarity queries under the restricted programming model of a GPU, and define a similarity join method. Moreover, we define highly parallel algorithms for density-based and partitioning clustering. In an extensive experimental evaluation, we demonstrate the superiority of our algorithms running on GPU over their conventional counterparts in CPU. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/842_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/842_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Munich, Germany</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>08/24/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="boehm@dbs.ifi.lmu.de">Christian Bohm</Author>
           <Author email="noll@dbs.ifi.lmu.de">Robert Noll</Author>
           <Author email="plant@lrz.tum.de">Claudia Plant</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/w0x8hr6831470642/?p=60db4e60c7714e7087d810ae6a83dbf5">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Christian Bohm,Robert Noll,Claudia Plant,boehm@dbs.ifi.lmu.de,noll@dbs.ifi.lmu.de,plant@lrz.tum.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>23658fd2-eb70-4bb7-bc9a-5efb3e91b16e</GUID>
        <Name>GPU RayTracing Pipeline</Name>
        <ShortDescription>We present a novel approach to ray tracing execution on commodity graphics hardware using CUDA. We decompose a standard ray tracing algorithm into several data-parallel stages that are mapped efficiently to the massively parallel architecture of modern GPUs. These stages include: ray sorting into coherent packets, creation of frustums for packets, breadth-first frustum traversal through a bounding volume hierarchy for the scene, and localized ray-primitive intersections. We utilize the well known parallel primitives scan and segmented scan in order to process irregular data structures, to remove the need for a stack, and to minimize branch divergence in all stages. Our ray sorting stage is based on applying hash values to individual rays, ray stream compression, sorting and decompression. Our breadth-first BVH traversal is based on parallel frustum-bounding box intersection tests and parallel scan per each BVH level. We demonstrate our algorithm with area light sources to get a soft shadow effect and show that our concept is reasonable for GPU implementation. For the same data sets and ray-primitive intersection routines our pipeline is ~3x faster than an optimized standard depth first ray tracing implemented in one kernel. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/841_paper4_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/841_paper4_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Keldysh Institute of Applied Mathematics / Microsoft Research</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>02/10/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>3</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="kirill@garanzha.com">K.Garanzha</Author>
           <Author email="">C.Loop</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://garanzha.com">Paper</ContentType>
           <ContentType url="http://garanzha.com">Presentation</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Ray Tracing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>GPU, ray tracing, custom pipeline,K.Garanzha,C.Loop,kirill@garanzha.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>89d31666-0540-4526-800d-124ea52364d8</GUID>
        <Name>Maaap Reduce </Name>
        <ShortDescription>In order to verify the feasibility of using the GPU for a fairly substantial and rapidly changing dataset, a simple set of benchmark functions were created for three main programming language families. Each test evaluated every element in the MNIST dataset with the sigmoid function. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/840_defaultlogo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/840_defaultlogo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>CUDA Developer</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>04/07/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Paul Reimer</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/maaap-reduce/wiki/Benchmarks">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Libraries</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Paul Reimer</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5f27c6d9-6984-495a-bf57-bca8dd6ea108</GUID>
        <Name>Rocks CUDA</Name>
        <ShortDescription>Rocks Cluster Distribution is a linux distribution for HPC clusters. It was started by National Partnership for Advanced Computational Infrastructure and the SDSC in 2000. Rocks includes many tools that make a group of computers into a cluster.
Installations can be customized with additional software packages at install-time by using special user-supplied CDs (called "Roll CDs"). The "Rolls" extend the system by integrating seamlessly and automatically into the management and packaging mechanisms used by base software, greatly simplifying installation and configuration of large numbers of computers. This project will contain the source code and images for an NPACI ROCKS 5.0 Roll for NVIDIA CUDA libraries and drivers.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/839_defaultlogo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/839_defaultlogo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>CUDA Developer</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>08/14/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">3kforme</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/rockscuda/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>3kforme</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>746a6455-e475-40f9-b881-6f85a8ec0e76</GUID>
        <Name>GPU based Sparse Grid Technique for Solving Multidimensional Options Pricing PDEs</Name>
        <ShortDescription>It has been shown that the sparse grid combination technique can be a practical tool to solve high dimensional PDEs arising in multidimensional option pricing problems in finance. Hierarchical approximation of these problems leads to linear systems that are smaller in size compared to those arising from standard finite element or finite difference discretizations. However, these systems are still excessively demanding in terms of memory for direct methods and challenging to solve by iterative methods. In this paper we address iterative solutions via preconditioned Krylov subspace based methods, such as Stabilized BiConjugate Gradient (BiCGStab) and CG Squared (CGS), with the main focus on the design of such iterative solvers to harness massive parallelism of general purpose Graphics Processing Units (GPGPU)s. We discuss data structures and efficient implementation of iterative solvers. We also present a number of performance results to demonstrate the scalability of these solvers on the NVIDIA's CUDA platform.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/838_graph_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/838_graph_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Chatenay-Malabry, France</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>12/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>1000</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="abhijeet.gaikwad@ecp.fr">Abhijeet Gaikwad</Author>
           <Author email="ioane.muni-toke@ecp.fr">Ioane Muni Toke</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.nvidia.com/content/cudazone/CUDABrowser/downloads/papers/sc2009_preprint.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Finance</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>NVIDIA CUDA, Iterative solvers, multidimensional option,Abhijeet Gaikwad,Ioane Muni Toke,abhijeet.gaikwad@ecp.fr,ioane.muni-toke@ecp.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a65412a3-1d34-490f-a209-9f8d486c7b55</GUID>
        <Name>Micromanager London Kings </Name>
        <ShortDescription>CUDA makes the processing power of NVIDIA graphics cards available for normal computation. Here are Some Add-ons for uManager http://www.micro-manager.org/, a free cross-platform software to control microscopes and do image acquisition. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/837_defaultlogo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/837_defaultlogo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>CUDA Developer</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>04/28/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Martin Kielhorn</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/micromanager-london-kings/wiki/Cuda_1">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Libraries</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Martin Kielhorn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c0250d55-553e-4fa6-aa6a-e91916638b97</GUID>
        <Name>CBCL Model CUDA</Name>
        <ShortDescription>CUDA version of the HMAX model</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/836_defaultlogo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/836_defaultlogo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>CUDA Developer</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>01/28/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Sharat.Chikkerur</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/cbcl-model-cuda/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sharat.Chikkerur</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9afbcfda-88d9-44a4-9cf3-8c2e3c2ec1d9</GUID>
        <Name>Real-time virtual environment signal extraction and denoising using programmable graphics hardware </Name>
        <ShortDescription>The sense of being within a three-dimensional (3D) space and interacting with virtual 3D objects in a computer-generated virtual environment (VE) often requires essential image, vision and sensor signal processing techniques such as differentiating and denoising. This paper describes novel implementations of the Gaussian filtering for characteristic signal extraction and wavelet-based image denoising algorithms that run on the graphics processing unit (GPU). While significant acceleration over standard CPU implementations is obtained through exploiting data parallelism provided by the modern programmable graphics hardware, the CPU can be freed up to run other computations more efficiently such as artificial intelligence (AI) and physics. The proposed GPU-based Gaussian filtering can extract surface information from a real object and provide its material features for rendering and illumination. The wavelet-based signal denoising for large size digital images realized in this project provided better realism for VE visualization without sacrificing real-time and interactive performances of an application.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/835_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/835_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Huddersfield, Queensgate, Huddersfield</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>10/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="y.su@hud.ac.uk">Yang Su</Author>
           <Author email="">Zhi-Jie Xu</Author>
           <Author email="">Xiang-Qian Jiang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/w40113672g700213/?p=60db4e60c7714e7087d810ae6a83dbf5">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yang Su,Zhi-Jie Xu,Xiang-Qian Jiang,y.su@hud.ac.uk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>578067f7-ac4b-47f2-950b-1f9ed61408e5</GUID>
        <Name>Extracting Curve Skeletons from Gray Value Images for Virtual Endoscopy </Name>
        <ShortDescription>The extraction of curve skeletons from tubular networks is a necessary prerequisite for virtual endoscopy applications. We present an approach for curve skeleton extraction directly from gray value images that supersedes the need to deal with segmentations and skeletonizations. The approach uses properties of the Gradient Vector Flow to derive a tube-likeliness measure and a medialness measure. Their combination allows the detection of tubular structures and an extraction of their medial curves that stays centered also in cases where the structures are not tubular such as junctions or severe stenoses. We present results on clinical datasets and compare them to curve skeletons derived with different skeletonization approaches from high quality segmentations. Our approach achieves a high centerline accuracy and is computationally efficient by making use of a GPU based implementation of the Gradient Vector Flow.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/834_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/834_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Graz University of Technology, Austria</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>07/15/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="cbauer@icg.tu-graz.ac.at">Christian Bauer</Author>
           <Author email="bischof@icg.tu-graz.ac.at">Horst Bischof</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/69323487n4u31002/?p=fb5eb2594736451689b09cf51d6886b8">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Christian Bauer,Horst Bischof,cbauer@icg.tu-graz.ac.at,bischof@icg.tu-graz.ac.at</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>bb093c2d-d78d-4eb7-81b1-af6e59587e17</GUID>
        <Name>Evaluating the Jaccard-Tanimoto Index on Multi-core Architectures</Name>
        <ShortDescription>The Jaccard/Tanimoto coefficient is an important workload, used in a large variety of problems including drug design fingerprinting, clustering analysis, similarity web searching and image segmentation. This paper evaluates the Jaccard coefficient on three platforms: the Cell Broadband Engine processor Intel Xeon dualcore platform and NVIDIA 8800 GTX GPU. In our work, we have developed a novel parallel algorithm specially suited for the Cell/B.E. architecture for all-to-all Jaccard comparisons, that minimizes DMA transfers and reuses data in the local store. We show that our implementation on Cell/B.E. outperforms the implementations on comparable Intel platforms by 6-20X with full accuracy, and from 10-50X in reduced accuracy mode, depending on the size of the data, and by more than 60X compared to Nvidia 8800 GTX. In addition to performance, we also discuss in detail our efforts to optimize our workload on these architectures and explain how avenues for optimization on each architecture are very different and vary from one architecture to another for our workload. Our work shows that the algorithms or kernels employed for the Jaccard coefficient calculation are heavily dependent on the traits of the target hardware. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/833_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/833_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Technologies Design Center, Indianapolis</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>20</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="vsachde@us.ibm.com">Vipin Sachdeva</Author>
           <Author email="dmfreim@us.ibm.com">Douglas M. Freimuth</Author>
           <Author email="chemuell@cs.indiana.edu">Chris Mueller</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/542w421534284x43/?p=fb5eb2594736451689b09cf51d6886b8">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Vipin Sachdeva,Douglas M. Freimuth,Chris Mueller,vsachde@us.ibm.com,dmfreim@us.ibm.com,chemuell@cs.indiana.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7a0a2a4f-3fa3-4ed0-8e1c-f3dd9f2835e7</GUID>
        <Name>Focused Volumetric Visual Hull with Color Extraction</Name>
        <ShortDescription>This paper introduces a new approach for volumetric visual hull reconstruction, using a voxel grid that focuses on the moving target object. This grid is continuously updated as a function of object location, orientation, and size. The benefit is a reduced amount of voxels that have to be evaluated or allocated towards capturing the target at higher resolution. This technique particularly improves reconstructions where the total reconstruction space is larger than the moving reconstruction target. The higher resolution of the voxel grid also reduces the computational cost per voxel reprojection since a one voxel to one input pixel reprojection ratio is approximated. In addition, the appropriate view independent color of the surface voxels is computed allowing for realistic visual hull texturing. All color calculations are performed locally, based on approximated surface voxel normals and the input images. A color outlier detection approach is introduced, which reduces the influence of occlusions in the color evaluation. The parallel nature of the presented focused visual hull reconstruction technique, lends itself to hardware acceleration, allowing interactive rates to be achieved by performing most computations on the GPU. A set of case studies is provided for well-defined static and dynamic data sets.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/832_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/832_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of California, San Diego</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>11/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Daniel Knoblauch</Author>
           <Author email="">Falko Kuester</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/3652g2h32150v271/?p=fb5eb2594736451689b09cf51d6886b8">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Daniel Knoblauch,Falko Kuester</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>89f67b35-e35e-4e8c-8b22-14c848a66f32</GUID>
        <Name>Fourier Volume Rendering on GPGPU</Name>
        <ShortDescription>Fourier Volume Rendering (FVR) is a volume rendering technique with lower computational complexity of O(N 2 logN) for an N 3 data array. A new FVR algorithm is proposed through expanding Fourier Projection-Slice Theorem into High-Dimension and mapping the pipeline totally on GPU. A windowed-sinc function is used as reconstruction filter to implement higher-order interpolation and reduction of samples is executed on GPU in parallel, which meets the architecture of Heterogeneous multi-core. The rendering is accelerated by a factor of 7 when rendering images resolution is larger than 512x512.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/831_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/831_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Hunan University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>05/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Degui Xiao</Author>
           <Author email="">Yi Liu</Author>
           <Author email="">Lei Yang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/5548u3274r1517u7/?p=fb5eb2594736451689b09cf51d6886b8">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Degui Xiao,Yi Liu,Lei Yang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e7dc92ba-1736-4c6f-92ea-6ac559d565f7</GUID>
        <Name>Practical Random Linear Network Coding on GPUs</Name>
        <ShortDescription>Recently, random linear network coding has been widely applied in peer-to-peer network applications. Instead of sharing the raw data with each other, peers in the network produce and send encoded data to each other. As a result, the communication protocols have been greatly simplified, and the applications experience higher end-to-end throughput and better robustness to network churns.Since it is difficult to verify the integrity of the encoded data, such systems can suffer from the famous pollution attack, in which a malicious node can send bad encoded blocks that consist of bogus data. Consequently, the bogus data will be propagated into the whole network at an exponential rate. Homomorphic hash functions (HHFs) have been designed to defend systems from such pollution attacks, but with a new challenge: HHFs require that network coding must be performed in GF(q), where q is a very large prime number. This greatly increases the computational cost of network coding, in addition to the already computational expensive HHFs. This paper exploits the potential of the huge computing power of Graphic Processing Units (GPUs) to reduce the computational cost of network coding and homomorphic hashing. With our network coding and HHF implementation on GPU, we observed significant computational speedup in comparison with the best CPU implementation. This implementation can lead to a practical solution for defending against the pollution attacks in distributed systems.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/830_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/830_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Hong Kong Baptist University / University of Calgary, Alberta, Canada</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>05/07/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="chxw@comp.hkbu.edu.hk">Xiaowen Chu</Author>
           <Author email="kyzhao@comp.hkbu.edu.hk">Kaiyong Zhao</Author>
           <Author email="meawang@ucalgary.ca">Mea Wang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/12r8mj83g5655542/?p=fb5eb2594736451689b09cf51d6886b">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Xiaowen Chu,Kaiyong Zhao,Mea Wang,chxw@comp.hkbu.edu.hk,kyzhao@comp.hkbu.edu.hk,meawang@ucalgary.ca</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>207cd764-e884-47aa-b0c3-b5505bedfbe4</GUID>
        <Name>Fast Conjugate Gradients with Multiple GPUs</Name>
        <ShortDescription>The limiting factor for efficiency of sparse linear solvers is the memory bandwidth. In this work, we describe a fast Conjugate Gradient solver for unstructured problems, which runs on multiple GPUs installed on a single mainboard. The solver achieves double precision accuracy with single precision GPUs, using a mixed precision iterative refinement algorithm. To achieve high computation speed, we propose a fast sparse matrix-vector multiplication algorithm, which is the core operation of iterative solvers. The proposed multiplication algorithm efficiently utilizes GPU resources via caching, coalesced memory accesses and load balance between running threads. Experiments on wide range of matrices show that our matrix-vector multiplication algorithm achieves up to 11.6 Gflops on single GeForce 8800 GTS card and CG implementation achieves up to 24.6 Gflops with four GPUs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/829_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/829_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Tokyo Institute of Technology / National Institute of Informatics, Japan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ali@matsulab.is.titech.ac.jp">Ali Cevahir</Author>
           <Author email="nukada@matsulab.is.titech.ac.jp">Akira Nukada</Author>
           <Author email="matsu@is.titech.ac.jp">Satoshi Matsuoka</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/9m742203qp7802m7/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ali Cevahir,Akira Nukada,Satoshi Matsuoka,ali@matsulab.is.titech.ac.jp,nukada@matsulab.is.titech.ac.jp,matsu@is.titech.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>307d80ab-1016-4ba9-9fda-be6f1e85a18f</GUID>
        <Name>Applying the Stream-Based Computing Model to Design Hardware Accelerators: A Case Study</Name>
        <ShortDescription>To facilitate the design of hardware accelerators we propose in this paper the adoption of the stream-based computing model and the usage of Graphics Processing Units (GPUs) as prototyping platforms. This model exposes the maximum data parallelism available in the applications and decouples computation from memory accesses. The design and implementation procedures, including the programming of GPUs, are illustrated with the widely used MrBayes bioinformatics application. Experimental results show that a straightforward mapping of the stream-based program for the GPU into hardware structures leads to improvements in performance, scalability and cost. Moreover, it is shown that a set of simple optimization techniques can be applied in order to reduce the cost, and the power consumption of hardware solutions. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/828_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/828_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Rua Alves Redol</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>07/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="fcpp@inesc-id.pt">Frederico Pratas</Author>
           <Author email="las@inesc-id.pt">Leonel Sousa</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/6720653366867q70/?p=fb5eb2594736451689b09cf51d6886b8">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Frederico Pratas,Leonel Sousa,fcpp@inesc-id.pt,las@inesc-id.pt</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>24d9dfbe-430a-4065-b835-69d1728e3a2b</GUID>
        <Name>Parallel Calculating of the Goal Function in Metaheuristics Using GPU</Name>
        <ShortDescription>We consider a metaheuristic optimization algorithm which uses single process (thread) to guide the search through the solution space. Thread performs in the cyclic way (iteratively) two main tasks: the goal function evaluation for a single solution or a set of solutions and management (solution filtering and selection, collection of history, updating). The latter task takes statistically 1-3% total iteration time, therefore we skip its acceleration as useless. The former task can be accelerated in parallel environments in various manners. We propose certain parallel small-grain calculation model providing the cost optimal method. Then, we carry out an experiment using Graphics Processing Unit (GPU) to confirm our theoretical results. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/827_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/827_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Wrocaw University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="wojciech.bozejko@pwr.wroc.pl">Wojciech Bozejko</Author>
           <Author email="czeslaw.smutnicki@pwr.wroc.pl">Czes'aw Smutnicki</Author>
           <Author email="mariusz.uchronski@pwr.wroc.pl">Mariusz Uchronski</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/fp7l0800u7715872/?p=fb5eb2594736451689b09cf51d6886b8">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Wojciech Bozejko,Czes'aw Smutnicki,Mariusz Uchronski,wojciech.bozejko@pwr.wroc.pl,czeslaw.smutnicki@pwr.wroc.pl,mariusz.uchronski@pwr.wroc.pl</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>89537d32-f563-4d80-af24-b3b43058d026</GUID>
        <Name>Accelerating astrophysical particle simulations with programmable hardware (FPGA and GPU) </Name>
        <ShortDescription>In a previous paper we have shown that direct gravitational N-body simulations in astrophysics scale very well for moderately parallel supercomputers (order 10100 nodes). The best balance between computation and communication is reached if the nodes are accelerated by special purpose hardware; in this paper we describe the implementation of particle based astrophysical simulation codes on new types of accelerator hardware (field programmable gate arrays, FPGA, and graphical processing units, GPU). In addition to direct gravitational N-body simulations we also use the algorithmically similar smoothed particle hydrodynamics method as test application; the algorithms are used for astrophysical problems as e.g. evolution of galactic nuclei with central black holes and gravitational wave generation, and star formation in galaxies and galactic nuclei. We present the code performance on a single node using different kinds of special hardware (traditional GRAPE, FPGA, and GPU) and some implementation aspects (e.g. accuracy). The results show that GPU hardware for real application codes is as fast as GRAPE, but for an order of magnitude lower price, and that FPGA is useful for acceleration of complex sequences of operations (like SPH). We discuss future prospects and new cluster computers built with new generations of FPGA and GPU cards. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/826_implementation_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/826_implementation_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Heidelberg</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>05/12/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="spurzem@ari.uni-heidelberg.de">R. Spurzem</Author>
           <Author email="berczik@ari.uni-heidelberg.de">P. Berczik</Author>
           <Author email="guillermo.marcus@ziti.uni-heidelberg.de">G. Marcus</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/ew838w1334511061/?p=933dcabf38454d089551d5a476fca08c">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>R. Spurzem,P. Berczik,G. Marcus,spurzem@ari.uni-heidelberg.de,berczik@ari.uni-heidelberg.de,guillermo.marcus@ziti.uni-heidelberg.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a9e38eba-e87f-426b-916a-5c33b9f69177</GUID>
        <Name>A framework for exploring numerical solutions of advection reaction diffusion equations using a GPU-based approach</Name>
        <ShortDescription>In this paper we describe a general purpose, graphics processing unit (GP-GPU)-based approach for solving partial differential equations (PDEs) within advection reaction diffusion models. The GP-GPU-based approach provides a platform for solving PDEs in parallel and can thus significantly reduce solution times over traditional CPU implementations. This allows for a more efficient exploration of various advection reaction diffusion models, as well as, the parameters that govern them. Although the GPU does impose limitations on the size and accuracy of computations, the PDEs describing the advection reaction diffusion models of interest to us fit comfortably within these constraints. Furthermore, the GPU technology continues to rapidly increase in speed, memory, and precision, thus applying these techniques to larger systems should be possible in the future. We chose to solve the PDEs using two numerical approaches: for the diffusion, a first-order explicit forward Euler solution and a semi-implicit second order Crank Nicholson solution; and, for the advection and reaction, a first-order explicit solution. The goal of this work is to provide motivation and guidance to the application scientist interested in exploring the use of the GP-GPU computational framework in the course of their research. In this paper, we present a rigorous comparison of our GPU-based advection reaction diffusion code model with a CPU-based analog, finding that the GPU model out-performs the CPU implementation in one-to-one comparisons. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/825_computedvisualation_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/825_computedvisualation_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Utah</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>04</ReleaseDay>
        <ReleaseDateDisplay>03/04/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="allen@sci.utah.edu">Allen R. Sanderson</Author>
           <Author email="miriah@sci.utah.edu">Miriah D. Meyer</Author>
           <Author email="kirby@sci.utah.edu">Robert M. Kirby</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/t4pj83q74k7h3534/?p=933dcabf38454d089551d5a476fca08c">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Allen R. Sanderson,Miriah D. Meyer,Robert M. Kirby,allen@sci.utah.edu,miriah@sci.utah.edu,kirby@sci.utah.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7d3cc29a-3dac-4791-8478-77dd28708ea8</GUID>
        <Name>Going Forward with GPU Computing</Name>
        <ShortDescription>This article describes why CEA is looking at GPU Computing and how the first experiments are conducted. We describe here a well defined global strategy which relies on training users and taking advantage of Grand Challenges, involving early access users and system administrators. We also describe some preliminary results and raise questions which need to be addressed in the near future.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/824_highperformancecomputing_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/824_highperformancecomputing_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>CEA, DAM</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>10/07/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Guillaume Colin de Verdiere</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/h71v663783rx85g7/?p=933dcabf38454d089551d5a476fca08c">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Guillaume Colin de Verdiere</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>441d4d84-f548-4465-ac76-eef36ff2a059</GUID>
        <Name>Introduction to Mastering Cell BE and GPU Execution Platforms </Name>
        <ShortDescription>Both Cell BE-type and GPU processors have emerged as multi-processor execution platforms that can outperform general purpose multi-core computers in certain application domains. The two architectures are quite different, and by no means interchangeable. GPUs are reminiscent of fine-grained systolic array architectures, while the Cell BE is suitable to execute a set of co-ordinated coarse-grained tasks. By now, enough applications have been mapped on either of these two processors, mostly by hand, that the pros and cons tables can be filled. The next step is to provide mappings that are based on efficient programming models and methods, in particular methods that minimize communication overheads. The six papers in this special session are attempts to take precisely that route. Three of them are taking the GPU as the underlying execution platform, the third taking also the Cell-BE multicore processor into consideration. The other three papers are targetting the Cell-BE processor. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/823_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/823_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Leiden University, the Netherlands</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>07/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Ed Deprettere</Author>
           <Author email="">Ana L. Varbanescu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/087047416q2k63k0/?p=617f22391ecf47f89a3da0c82420ae97">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ed Deprettere,Ana L. Varbanescu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8949e7e3-c9b6-487a-894e-75c35f7b8d45</GUID>
        <Name>Development of a GPU-based multithreaded software application to calculate digitally reconstructed radiographs for radiotherapy</Name>
        <ShortDescription>To provide faster calculation of digitally reconstructed radiographs (DRRs) in patient-positioning verification, we developed and evaluated a graphic processing unit (GPU)-based DRR software application and compared it with a central processing unit (CPU)-based application. The evaluation metrics were calculation speed and image quality for various slice thicknesses. The results showed that the GPU-based DRR computation was an average of 50 times faster than the CPU-based methodology, whereas the image quality was very similar. This excellent performance may increase the accuracy of patient positioning and improve the patient treatment throughput time</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/822_radialogics_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/822_radialogics_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>National Institute of Radiological Sciences, Japan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>11/07/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="shinshin@nirs.go.jp">Shinichiro Mori</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/v57t4uj446138427/?p=617f22391ecf47f89a3da0c82420ae97">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Shinichiro Mori,shinshin@nirs.go.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>6884796d-0fa2-4f33-9297-1fde62fcc824</GUID>
        <Name>Lattice Boltzmann based PDE solver on the GPU</Name>
        <ShortDescription>In this paper, we propose a hardware-accelerated PDE (partial differential equation) solver based on the lattice Boltzmann model (LBM). The LBM is initially designed to solve fluid dynamics by constructing simplified microscopic kinetic models. As an explicit numerical scheme with only local operations, it has the advantage of being easy to implement and especially suitable for graphics hardware (GPU) acceleration. Beyond the Navier Stokes equation of fluid mechanics, a typical LBM can be modified to solve the parabolic diffusion equation, which is further used to solve the elliptic Laplace and Poisson equations with a diffusion process. These PDEs are widely used in modeling and manipulating images, surfaces and volumetric data sets. Therefore, the LBM scheme can be used as an GPU-based numerical solver to provide a fast and convenient alternative to traditional implicit iterative solvers. We apply this method to several examples in volume smoothing, surface fairing and image editing, achieving outstanding performance on contemporary graphics hardware. It has the great potential to be used as a general GPU computing framework for efficiently solving PDEs in image processing, computer graphics and visualization. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/821_visualcomputer_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/821_visualcomputer_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Kent State University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>12/07/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="zhao@cs.kent.edu">Ye Zhao</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/l8x284048269263x/?p=617f22391ecf47f89a3da0c82420ae97">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ye Zhao,zhao@cs.kent.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1acdf9de-8761-4f13-9fea-7b8b02b55719</GUID>
        <Name>Real-Time Online Video Object Silhouette Extraction Using Graph Cuts on the GPU</Name>
        <ShortDescription>Being able to find the silhouette of an object is a very important front-end processing step for many high-level computer vision techniques, such as Shape-from-Silhouette 3D reconstruction methods, object shape tracking, and pose estimation. Graph cuts have been proposed as a method for finding very accurate silhouettes which can be used as input to such high level techniques, but graph cuts are notoriously computation intensive and slow. Leading CPU implementations can extract a silhouette from a single QVGA image in 100 milliseconds, with performance dramatically decreasing with increased resolution. Recent GPU implementations have been able to achieve performance of 6 milliseconds per image by exploiting the intrinsic properties of the lattice graphs and the hardware model of the GPU. However, these methods are restricted to a subclass of lattice graphs and are not generally applicable. We propose a novel method for graph cuts on the GPU which places no limits on graph configuration and which is able to achieve comparable real-time performance in online video processing scenarios. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/820_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/820_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Keio University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>08/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="zgarrett@hvrl.ics.keio.ac.jp">Zachary A. Garrett</Author>
           <Author email="saito@hvrl.ics.keio.ac.jp">Hideo Saito</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/928267731044g820/?p=617f22391ecf47f89a3da0c82420ae97">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Zachary A. Garrett,Hideo Saito,zgarrett@hvrl.ics.keio.ac.jp,saito@hvrl.ics.keio.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1919e879-ecaa-471f-b6cb-93415638c16a</GUID>
        <Name>Seeded ND medical image segmentation by cellular automaton on GPU</Name>
        <ShortDescription>Purpose  We present a GPU-based framework to perform organ segmentation in N-dimensional (ND) medical image datasets by computation of weighted distances using the Ford Bellman algorithm (FBA). Our GPU implementation of FBA gives an alternative and optimized solution to other graph-based segmentation techniques.</ShortDescription>
        <URL>http://springerlink.com/content/v92w2q820w412jj8/?p=617f22391ecf47f89a3da0c82420ae97&amp;pi=63</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/819_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/819_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Notre-Dame Hospital</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>07/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="claude.kauffmann@gmail.com">Claude Kauffmann</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/v92w2q820w412jj8/?p=617f22391ecf47f89a3da0c82420ae97&amp;pi=63">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Claude Kauffmann,claude.kauffmann@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4bd610d3-92f8-4032-9730-02b0e6091d1f</GUID>
        <Name>On GPU's viability as a middleware accelerator </Name>
        <ShortDescription>Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible cost-effective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing these highly-parallel devices to support more generic functionality at the operating system or middleware level. This study starts from the hypothesis that generic middleware-level techniques that improve distributed system reliability or performance (such as content addressing, erasure coding, or data similarity detection) can be significantly accelerated using GPU support. We take a first step towards validating this hypothesis and we design StoreGPU, a library that accelerates a number of hashing-based middleware primitives popular in distributed storage system implementations. Our evaluation shows that StoreGPU enables up twenty five fold performance gains on synthetic benchmarks as well as on a high-level application: the online similarity detection between large data files. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/818_scalable_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/818_scalable_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of British Columbia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>01/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="samera@ece.ubc.ca">Samer Al-Kiswany</Author>
           <Author email="abdullah@ece.ubc.ca">Abdullah Gharaibeh</Author>
           <Author email="elizeus@ece.ubc.ca">Elizeu Santos-Neto</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/8260x51q6440v403/?p=617f22391ecf47f89a3da0c82420ae97&amp;pi=62">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Samer Al-Kiswany,Abdullah Gharaibeh,Elizeu Santos-Neto,samera@ece.ubc.ca,abdullah@ece.ubc.ca,elizeus@ece.ubc.ca</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>03286d23-be49-45d1-be2b-790c02badee7</GUID>
        <Name>Implementing Decision Trees and Forests on a GPU</Name>
        <ShortDescription>We describe a method for implementing the evaluation and training of decision trees and forests entirely on a GPU, and show how this method can be used in the context of object recognition. Our strategy for evaluation involves mapping the data structure describing a decision forest to a 2D texture array. We navigate through the forest for each point of the input data in parallel using an efficient, non-branching pixel shader. For training, we compute the responses of the training data to a set of candidate features, and scatter the responses into a suitable histogram using a vertex shader. The histograms thus computed can be used in conjunction with a broad range of tree learning algorithms. 
</ShortDescription>
        <URL>http://springerlink.com/content/y702n504831g232m/?p=617f22391ecf47f89a3da0c82420ae97&amp;pi=61</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/817_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/817_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Microsoft Research, Cambridge, UK</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>10/12/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="toby.sharp@microsoft.com">Toby Sharp</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/y702n504831g232m/?p=617f22391ecf47f89a3da0c82420ae97&amp;pi=61">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Toby Sharp,toby.sharp@microsoft.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e4fd34a1-868c-482b-9522-41104b157431</GUID>
        <Name>CUDAMat</Name>
        <ShortDescription>CUDAMat provides a CUDA-based matrix class for Python, making it easy to implement algorithms that are easily expressed in terms of dense linear algebra. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/816_google_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/816_google_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Toronto</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>11/30/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>50</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="vmnih@cs.toronto.edu">Volodymyr Mnih</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/cudamat/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Libraries</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Volodymyr Mnih,vmnih@cs.toronto.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0692da9d-1f32-4819-a7e6-278383b1c438</GUID>
        <Name>Parallelization of a Video Segmentation Algorithm on CUDA Enabled Graphics Processing Units</Name>
        <ShortDescription>Nowadays, Graphics Processing Units (GPU) are emerging as SIMD coprocessors for general purpose computations, specially after the launch of nVIDIA CUDA. Since then, some libraries have been implemented for matrix computation and image processing. However, in real video applications some stages need irregular data distributions and the parallelism is not so inherent. This paper presents the parallelization of a video segmentation application on GPU hardware, which implements an algorithm for abrupt and gradual transitions detection. A critical part of the algorithm requires highly intensive computation for video frames features calculation. Results on three CUDA-enabled GPUs are encouraging, because of the significant speedup achieved. They are also compared with an OpenMP version of the algorithm, running on two platforms with multiples cores.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/815_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/815_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Cordoba, Spain / University of Malaga, Spain</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>08/22/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="el1goluj@uco.es">Juan Gomez-Luna</Author>
           <Author email="gonzalez@ac.uma.es">Jose Maria Gonzalez-Linares</Author>
           <Author email="el1bebej@uco.es">Jose Ignacio Benavides</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/d76622215h42m733/?p=f1707317a6624bd9afd08d7a9739c995&amp;pi=56">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Juan Gomez-Luna,Jose Maria Gonzalez-Linares,Jose Ignacio Benavides,el1goluj@uco.es,gonzalez@ac.uma.es,el1bebej@uco.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ae4da9b0-398e-4b88-ad64-95c879d6e61f</GUID>
        <Name>Fast and automatic object pose estimation for range images on the GPU</Name>
        <ShortDescription>We present a pose estimation method for rigid objects from single range images. Using 3D models of the objects, many pose hypotheses are compared in a data-parallel version of the downhill simplex algorithm with an image-based error function. The pose hypothesis with the lowest error value yields the pose estimation (location and orientation), which is refined using ICP. The algorithm is designed especially for implementation on the GPU. It is completely automatic, fast, robust to occlusion and cluttered scenes, and scales with the number of different object types. We apply the system to bin picking, and evaluate it on cluttered scenes. Comprehensive experiments on challenging synthetic and real-world data demonstrate the effectiveness of our method. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/814_implementation_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/814_implementation_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Inha University, Korea</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>04</ReleaseDay>
        <ReleaseDateDisplay>08/04/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="pik@inha.ac.kr">In Kyu Park</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/q4723815w714n2xr/?p=f1707317a6624bd9afd08d7a9739c995&amp;pi=53">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>In Kyu Park,pik@inha.ac.kr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>6a9ef568-5517-4b74-b3d0-0070e8b2ab21</GUID>
        <Name>MinGPU: a minimum GPU library for computer vision</Name>
        <ShortDescription>In the field of computer vision, it is becoming increasingly popular to implement algorithms, in sections or in their entirety, on a graphics processing unit (GPU). This is due to the superior speed GPUs offer compared to CPUs. In this paper, we present a GPU library, MinGPU, which contains all of the necessary functions to convert an existing CPU code to GPU. We have created GPU implementations of several well known computer vision algorithms, including the homography transformation between two 3D views. We provide timing charts and show that our MinGPU implementation of homography transformations performs approximately 600 times faster than its C++ CPU implementation.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/813_iss_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/813_iss_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Central Florida</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>05/28/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="pavelb@cs.ucf.edu">Pavel Babenko</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/1164314511225480/?p=f1707317a6624bd9afd08d7a9739c995&amp;pi=51">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Pavel Babenko,pavelb@cs.ucf.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>38ad061e-364d-40a7-8e42-1233c587d56e</GUID>
        <Name>GPU Accelerated Non-rigid Registration for the Evaluation of Cardiac Function</Name>
        <ShortDescription>We present a method for the fast and efficient tracking of motion in cardiac magnetic resonance (CMR) cines. A GPU accelerated Levenberg-Marquardt non-linear least squares optimization procedure for finite element non-rigid registration was implemented on an NVIDIA graphics card using the OpenGL environment. Points were tracked from frame to frame using forward and backward incremental registration. The inner (endocardial) and outer (epicardial) boarders of the heart were tracked in six short axis cines with ~25 frames through the cardiac cycle in 36 patients with vascular disease. Contours placed by two independent expert observers using a semi-automatic ventricular analysis program (CIM version 4.6) were used as the gold standard. The method took 0.5 seconds per frame, and the maximum Hausdorff errors were less than 2 mm on average which was of the same order as the expert inter-observer error. In conclusion, GPU accelerated Levenberg-Marquardt non-linear optimization enables fast and accurate tracking of cardiac motion in CMR images.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/812_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/812_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Auckland</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>10/30/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="b.li@auckland.ac.nz">Bo Li</Author>
           <Author email="a.young@auckland.ac.nz">Alistair A. Young</Author>
           <Author email="b.cowan@auckland.ac.nz">Brett R. Cowan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/46j8v0r7070470m3/?p=1cbbf7d42868493da8e612a3b97202f9&amp;pi=49">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bo Li,Alistair A. Young,Brett R. Cowan,b.li@auckland.ac.nz,a.young@auckland.ac.nz,b.cowan@auckland.ac.nz</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>22e19a72-ca7f-4e5d-a9d9-fbe3cbb38d5c</GUID>
        <Name>A Hybrid Parallel Signature Matching Model for Network Security Applications Using SIMD GPU</Name>
        <ShortDescription>High performance signature matching against a large dictionary is of great importance in network security applications. The many-core SIMD GPU is a competitive choice for signature matching. In this paper, a hybrid parallel signature matching model (HPSMM) using SIMD GPU is proposed, which uses pattern set partition and input text partition together. Then the problem of load balancing for multiprocessors in the GPU is discussed carefully, and a balanced pattern set partition method (BPSPM) employed in HPSMM is introduced. Experiments demonstrate that using pattern set partition and input text partition together can help achieve a better performance, and the proposed BPSPM-Length works well in load balancing. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/811_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/811_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>National University of Defense Technology, China</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>08/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="chengkun_wu@nudt.edu.cn">Chengkun Wu</Author>
           <Author email="jpyin@nudt.edu.cn">Jianping Yin</Author>
           <Author email="zpcai@nudt.edu.cn">Zhiping Cai</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/k5x363617412j441/?p=1cbbf7d42868493da8e612a3b97202f9&amp;pi=46">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Chengkun Wu,Jianping Yin,Zhiping Cai,chengkun_wu@nudt.edu.cn,jpyin@nudt.edu.cn,zpcai@nudt.edu.cn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>89d9f616-d298-43a4-99e1-3fe1db248cba</GUID>
        <Name>Parallel 3D Image Segmentation of Large Data Sets on a GPU Cluster </Name>
        <ShortDescription>In this paper, we propose an inherent parallel scheme for 3D image segmentation of large volume data on a GPU cluster. This method originates from an extended Lattice Boltzmann Model (LBM), and provides a new numerical solution for solving the level set equation. As a local, explicit and parallel scheme, our method lends itself to several favorable features: (1) Very easy to implement with the core program only requiring a few lines of code; (2) Implicit computation of curvatures; (3) Flexible control of generating smooth segmentation results; (4) Strong amenability to parallel computing, especially on low-cost, powerful graphics hardware (GPU). The parallel computational scheme is well suited for cluster computing, leading to a good solution for segmenting very large data sets. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/810_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/810_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Kent State University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>11/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Aaron Hagan</Author>
           <Author email="">Ye Zhao</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/dv45r171t1027355/?p=1cbbf7d42868493da8e612a3b97202f9&amp;pi=45">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Aaron Hagan,Ye Zhao</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9e70b216-1271-4886-be56-fe79e2bb7ea9</GUID>
        <Name>Computing the Longest Common Transposition-Invariant Subsequence with GPU</Name>
        <ShortDescription>Finding a longest common transposition-invariant subsequence (LCTS) of two given integer sequences A&#8201;=&#8201;a 1 a 2...a m and B&#8201;=&#8201;b 1 b 2...b n (a generalization of the well-known longest common subsequence problem (LCS)) has arisen in the field of music information retrieval. In the LCTS problem, we look for an LCS for the sequences A&#8201;+&#8201;t&#8201;=&#8201;(a 1&#8201;+&#8201;t)(a 2&#8201;+&#8201;t)...(a m &#8201;+&#8201;t) and B where t is any integer. Performance of the top graphical processing units (GPUs) outgrew the performance of the top CPUs a few years ago and there is a surge of interest in recent years in using GPUs for general processing.We propose and evaluate a bit-parallel algorithm solving the LCTS problem on a GPU. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/809_Untitledsecuritytechnology_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/809_Untitledsecuritytechnology_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Silesian University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>10/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="sebastian.deorowicz@polsl.pl">Sebastian Deorowicz</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/e5084324pj884338/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=39">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sebastian Deorowicz,sebastian.deorowicz@polsl.pl</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>58db5b29-d3e0-4e9a-975a-d39dfd48e727</GUID>
        <Name>Real-Time GPU-Based Voxel Carving with Systematic Occlusion Handling</Name>
        <ShortDescription>We present an approach to compute the visual hulls of multiple people in real-time in the presence of occlusions. We prove that the resulting visual hulls are correct and minimal under occlusions. Our proposed algorithm runs completely on the GPU with framerates up to 50fps for multiple people using only one computer equipped with off-the-shelf hardware. We also compare runtimes for different graphic chips and show that our approach scales very well without additional effort. Comparison to other work shows that our algorithm is as fast as state-of-the-art technology. The resulting visual hulls can be the basis for a wide range of algorithms that require a robust voxel representation as input. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/808_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/808_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Fraunhofer IITB Karlsruhe / Universitat Karlsruhe </OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>09/02/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="alexander.schick@iitb.fraunhofer.de">Alexander Schick</Author>
           <Author email="rainer.stiefelhagen@iitb.fraunhofer.de">Rainer Stiefelhagen</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/m2212r130316g534/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=36">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alexander Schick,Rainer Stiefelhagen,alexander.schick@iitb.fraunhofer.de,rainer.stiefelhagen@iitb.fraunhofer.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f4504157-17b0-4b17-9476-d48e77994f7f</GUID>
        <Name>Arion Render</Name>
        <ShortDescription>Arion is the hybrid-accelerated and physically-based light simulator developed by RandomControl.  It comprises an interactive WYSIWYG editing application and a super-high performance production renderer. Arion's uses all the GPUs -and- all the CPUs in your system simultaneously, not wasting a single flop available. Additionally, Arion can use all the GPUs and all the CPUs in all the other computers in your network forming a cluster for massive computation.  Arion is a grid-computing solution to the problem of light physics simulation.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/807_arion_cuda_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/807_arion_cuda_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>RandomControl S.L.U.</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>04/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>50</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="tech@randomcontrol.com">RandomControl</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.randomcontrol.com/arion">Application</ContentType>
           <ContentType url="http://www.randomcontrol.com/arion">Multimedia</ContentType>
           <ContentType url="http://www.randomcontrol.com/arion">Presentation</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Raytracing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>raytracing rendering physically-based unbiased randomcontrol arion fryrender,RandomControl,tech@randomcontrol.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1a492908-0605-4d9e-af4f-085ff724e6cf</GUID>
        <Name>Asymmetric Distributed Shared Memory</Name>
        <ShortDescription>GMAC is a run-time system that implements an Asymmetric Disitributed Shared Memory model. This model eases the task of programming CUDA applications by building a unified global address space including system and GPU memories. Code executed at the CPU can transparently access data hosted by the GPU memory, but code run at the GPU is constrained to access the data hosted by its memory. GMAC removes the need to perform explicit data transfers using cudaMemcpy() calls and handles all data transfers in a transparent and efficient way. Moreover, the unified address space implemented by GMAC allows using CPU pointers in the GPU code.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/806_google_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/806_google_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universitat Politecnica de Catalunya / University of Illinois</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>11/02/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="igelado@ac.upc.edu">Isaac Gelado</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/adsm/">Application</ContentType>
           <ContentType url="http://code.google.com/p/adsm/">Presentation</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Libraries</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Isaac Gelado,igelado@ac.upc.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ef51a1b4-1fff-412e-a96d-796a24015f38</GUID>
        <Name>Octane Renderer</Name>
        <ShortDescription>Octane Render is a fully GPU-powered, un-biased and physically based rendering application, with a 10-15X speed increase over un-biased CPU based renderers</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/806_octane_cuda_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/806_octane_cuda_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Refractive Software LTD</OrganizationName>
        <OrganizationURL>http://www.refractivesoftware.com</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>01/10/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>15</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">Refractive Software LTD</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.refractivesoftware.com/purchase.html">Application</ContentType>
           <ContentType url="http://www.refractivesoftware.com/videos.html">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Video &amp; Audio</ApplicationType>
           <ApplicationType>Graphics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Refractive Software LTD</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>554c3825-b0de-4df9-bd68-f0dba7b2a590</GUID>
        <Name>Textbook: GPU</Name>
        <ShortDescription>Chinese text book for CUDA programing</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/803_20100202044228595_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/803_20100202044228595_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>www.hpctech.com</OrganizationName>
        <OrganizationURL>http://www.hpctech.com/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>10/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="zhao.kaiyong@gmail.com">Shu Zhang</Author>
           <Author email="">Yanli Chu</Author>
           <Author email="">Kaiyong Zhao</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.hpctech.com/announce/?announceid=2">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>HPC information</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Shu Zhang,Yanli Chu,Kaiyong Zhao,zhao.kaiyong@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a62c5428-2955-4cf3-9d0d-0078b395153f</GUID>
        <Name>QView</Name>
        <ShortDescription>Multi-math object viewer . Still under development.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/802_qview_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/802_qview_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>digitker - The digital kernel</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>04/30/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="dtsonov@digitker.com">Dimitar Tsonov</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://digitker.com/">Paper</ContentType>
           <ContentType url="http://digitker.com/">Presentation</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Finance</ApplicationType>
           <ApplicationType>Game Physics</ApplicationType>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Libraries</ApplicationType>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>math kernel viewer </ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Dimitar Tsonov,dtsonov@digitker.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ed7975e2-60da-449a-8a34-febfbd08eebf</GUID>
        <Name>Textbook: Programming Massively Parallel Processors: A Hands-on Approach</Name>
        <ShortDescription>The first textbook of its kind, Programming Massively Parallel Processors: A Hands-on Approach is authored by Dr. David B. Kirk, NVIDIA Fellow and former chief scientist, and Dr. Wen-mei Hwu, who serves at the University of Illinois at Urbana-Champaign as Chair of Electrical and Computer Engineering in the Coordinated Science Laboratory, co-director of the Universal Parallel Computing Research Center and principal investigator of the CUDA Center of Excellence. The textbook, which is 256 pages, is the first aimed at teaching advanced students and professionals the basic concepts of parallel programming and GPU architectures. Published by Morgan Kaufmann, it explores various techniques for constructing parallel programs and reviews numerous case studies. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/801_Kirk-HR_large_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/801_Kirk-HR_large_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>NVIDIA and UIUC</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>01/28/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="dkirk@nvidia.com">Dr. David Kirk</Author>
           <Author email="">Dr. Wen-meiHwu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.nvidia.com/object/io_1264656303008.html">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Progamming textbook</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>CUDA, Parallel Processing, NVIDIA, GPU,Dr. David Kirk,Dr. Wen-meiHwu,dkirk@nvidia.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c8e8ac46-4a7f-47db-b7d6-b79ae238ba7d</GUID>
        <Name>PARRET: Parellel RestoreTools</Name>
        <ShortDescription>PARRET is a Python package for image deblurring on GPUs. By making use of the parallelism on NVIDIA GPU CUDA architecture, the deblurring time is greatly reduced. Besides image deblurring, PARRET can be used to solve linear equations.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/800_demo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/800_demo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Emory University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>02/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>15</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="yfan@emory.edu">Ying Wai (Daniel) Fan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.mathcs.emory.edu/~yfan/PARRET/doc/index.html">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>deblurring, Python, linear systems of equations,Ying Wai (Daniel) Fan,yfan@emory.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3192f565-72ab-4885-9348-2b3afd2511d6</GUID>
        <Name>QUDA : A library for QCD on GPUs</Name>
        <ShortDescription>QUDA is a library for performing calculations in lattice QCD on graphics processing units (GPUs) using NVIDIA's C for CUDA API. The current release includes optimized kernels for applying the Wilson Dirac operator and clover-improved Wilson Dirac operator, kernels for performing various BLAS-like operations, and full inverters built on these kernels. Mixed-precision implementations of both CG and BiCGstab are provided, with support for double, single, and half (16-bit fixed-point) precision.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/799_quda_image_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/799_quda_image_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Boston University and Harvard University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>11/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="mikec@seas.harvard.edu">M. A. Clark</Author>
           <Author email="rbabich@bu.edu">R. Babich</Author>
           <Author email="kbarros@gmail.com">K. Barros</Author>
           <Author email="brower@bu.edu">R. Brower</Author>
           <Author email="rebbi@bu.edu">C. Rebbi</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://lattice.bu.edu/quda">Application</ContentType>
           <ContentType url="http://arxiv.org/abs/0911.3191">Paper</ContentType>
           <ContentType url="http://lattice.bu.edu/quda">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>QCD, linear solver, mixed precision,Mike Clark,mikec@seas.harvard.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>14721042-0396-4060-8731-199cc53e5bc2</GUID>
        <Name>SCGPSim: A fast SystemC simulator on GPUs</Name>
        <ShortDescription>The main objective of this paper is to speed up the simulation performance of SystemC designs at the RTL abstraction level by exploiting the high degree of parallelism afforded by today's general purpose graphics processors (GPGPUs). Our approach parallelizes SystemC's discrete-event simulation (DES) on GPGPUs by transforming the model of computation of DES into a model of concurrent threads that synchronize as and when necessary. Our simulation infrastructure is called SCGPSim and it includes a source-to-source (S2S) translator to transform synthesizable SystemC models into parallelly executable programs targeting an NVIDIA GPU. The translator retains the simulation semantics of the original designs by applying semantics preserving transformations. The resulting transformed models mapped onto the massively parallel architecture of GPUs improve simulation efficiency quite substantially. Preliminary experiments with varying-sized examples such as AES, ALU, and FIR have shown simulation speed-ups ranging from 30x to 100x. Considering that our transformations are not yet optimized, we believe that optimizing them will improve the simulation performance even further.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/798_scgp2_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/798_scgp2_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>FERMAT Lab, Virginia Tech, Blacksburg, VA</OrganizationName>
        <OrganizationURL>http://www.fermat.ece.vt.edu/</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>01/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>100</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="knmahesh@vt.edu">Mahesh Nanjundappa</Author>
           <Author email="">Hiren D Patel</Author>
           <Author email="">Bijoy A Jose</Author>
           <Author email="">Sandeep K Shukla</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://filebox.vt.edu/users/knmahesh/index_files/mahesh_scgpsim.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Electronic Design Automation</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mahesh Nanjundappa,Hiren D Patel,Bijoy A Jose,knmahesh@vt.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>128f6237-5801-4d4f-b825-fc3a01ba1578</GUID>
        <Name>Myocyte Simulation</Name>
        <ShortDescription>Code performes several time-step simulations of a Myocyte (heart muscle cell) in parallel, allowing to obtain results for different set of inputs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/797_Myocyte_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/797_Myocyte_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Virginia</OrganizationName>
        <OrganizationURL>http://www.virginia.edu</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>01/31/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="lgs9a@virginia.edu">Lukasz G. Szafaryn</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Myocyte">Application</ContentType>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Myocyte">Multimedia</ContentType>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Myocyte">Paper</ContentType>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Myocyte">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Life Sciences</ApplicationType>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Simulation</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>myocyte, simulation, ode solving, time-step,Lukasz G. Szafaryn,lgs9a@virginia.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ab039cd4-07bd-419e-b6b0-a2e7e7be3fec</GUID>
        <Name>Mutual Information Based Semi-Global Stereo Matching on the GPU </Name>
        <ShortDescription>Real-time stereo matching is necessary for many practical applications, including robotics. There are already many real-time stereo systems, but they typically use local approaches that cause object boundaries to be blurred and small objects to be removed. We have selected the Semi-Global Matching (SGM) method for implementation on graphics hardware, because it can compete with the currently best global stereo methods. At the same time, it is much more efficient than most other methods that produce a similar quality. In contrast to previous work, we have fully implemented SGM including matching with mutual information, which is partly responsible for the high quality of disparity images. Our implementation reaches 4.2 fps on a GeForce 8800 ULTRA with images of 640 x480 pixel size and 128 pixel disparity range and 13 fps on images of 320 x240 pixel size and 64 pixel disparity range. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/796_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/796_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>German Aerospace Center </OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>12/02/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ines.ernst@dlr.de">Ines Ernst</Author>
           <Author email="heiko.hirschmueller@dlr.de">Heiko Hirschmuller</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/m12112614k7834g4/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=35">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ines Ernst,Heiko Hirschmuller,ines.ernst@dlr.de,heiko.hirschmueller@dlr.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4f1c26e4-bd49-4db3-9e21-65632e62b00d</GUID>
        <Name>Experiences with Cell-BE and GPU for Tomography</Name>
        <ShortDescription>Tomography is a powerful technique for three-dimensional imaging, that deals with image reconstruction from a series of projection images, acquired along a range of viewing directions. An important part of any tomograph system is the reconstruction algorithm. Iterative reconstruction algorithms have many advantages over non-iterative methods, yet their running time can be prohibitively long. As these algorithms have high potential for parallelization, multi-core architectures, such as the Cell-BE and GPU, can possibly alleviate this problem. 
In this paper, we describe our experiences in mapping the basic operations of iterative reconstruction algorithms onto these platforms. We argue that for this type of problem, the GPU yields superior performance compared to the Cell-BE. Performance results of our implementation demonstrate a speedup of over 40 for a single GPU, compared to a single-core CPU version. By combining eight GPUs and a quad-core CPU in a single system, similar performance to a large cluster consisting of hundreds of CPU cores has been obtained. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/795_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/795_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> University of Antwerp, Belgium</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>07/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>40</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="Sander.vanderMaar@ua.ac.be">Sander van der Maar</Author>
           <Author email="Joost.Batenburg@ua.ac.be">Kees Joost Batenburg</Author>
           <Author email="Jan.Sijbers@ua.ac.be">Jan Sijbers</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/9362637125n513j6/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=34">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sander van der Maar,Kees Joost Batenburg,Jan Sijbers,Sander.vanderMaar@ua.ac.be,Joost.Batenburg@ua.ac.be,Jan.Sijbers@ua.ac.be</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9256c867-a33e-4bca-8dd1-f56c21b6047b</GUID>
        <Name>Experiences with Cell-BE and GPU for Tomography</Name>
        <ShortDescription>Tomography is a powerful technique for three-dimensional imaging, that deals with image reconstruction from a series of projection images, acquired along a range of viewing directions. An important part of any tomograph system is the reconstruction algorithm. Iterative reconstruction algorithms have many advantages over non-iterative methods, yet their running time can be prohibitively long. As these algorithms have high potential for parallelization, multi-core architectures, such as the Cell-BE and GPU, can possibly alleviate this problem. 
In this paper, we describe our experiences in mapping the basic operations of iterative reconstruction algorithms onto these platforms. We argue that for this type of problem, the GPU yields superior performance compared to the Cell-BE. Performance results of our implementation demonstrate a speedup of over 40 for a single GPU, compared to a single-core CPU version. By combining eight GPUs and a quad-core CPU in a single system, similar performance to a large cluster consisting of hundreds of CPU cores has been obtained. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/793_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/793_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> University of Antwerp, Belgium</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>07/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>40</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="Sander.vanderMaar@ua.ac.be">Sander van der Maar</Author>
           <Author email="Joost.Batenburg@ua.ac.be">Kees Joost Batenburg</Author>
           <Author email="Jan.Sijbers@ua.ac.be">Jan Sijbers</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/9362637125n513j6/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=34">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sander van der Maar,Kees Joost Batenburg,Jan Sijbers,Sander.vanderMaar@ua.ac.be,Joost.Batenburg@ua.ac.be,Jan.Sijbers@ua.ac.be</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4ad94310-447d-47c8-bd18-1a36ddda8728</GUID>
        <Name>Multi-walk Parallel Pattern Search Approach on a GPU Computing Platform </Name>
        <ShortDescription>This paper studies the efficiency of using Pattern Search (PS) on bound constrained optimization functions on a Graphics Processing Unit (GPU) computing platform. Pattern Search is a direct search optimization technique that does not require derivative information on non-linear programming problems. Pattern Search is ideally suited to a GPU computing environment due to its low memory requirement and no communication between threads in a multi-walk setting. To adapt to a GPU environment, traditional Pattern Search is modified by terminating based on iterations instead of tolerance. This research designed and implemented a multi-walk Pattern Search algorithm on a GPU computing platform. Computational results are promising with a computing speedup of 100+ compared to a corresponding implementation on a single CPU. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/792_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/792_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Lamar University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="Weihang.Zhu@lamar.edu">Weihang Zhu</Author>
           <Author email="jcurry@my.lamar.edu">James Curry</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/d655105451757237/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=31">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Weihang Zhu,James Curry,Weihang.Zhu@lamar.edu,jcurry@my.lamar.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1ecce826-a4da-4bd6-932e-11130eeee781</GUID>
        <Name>A GPU-Based Simulation of Tsunami Propagation and Inundation</Name>
        <ShortDescription>Tsunami simulation consists of fluid dynamics, numerical computations, and visualization techniques. Nonlinear shallow water equations are often used to model the tsunami propagation. By adding the friction slope to the conservation of momentum, it also can model the tsunami inundation. To solve these equations, we use the second order finite difference MacCormack method. Since it is a finite difference method, it brings the possibility to be parallelized. We use the parallelism provided by GPU to speed up the computations. By loading data as textures in GPU memory, the computation processes can be written as shader programs and the operations will be done by GPU in parallel. The results show that with the help of GPU, the simulation can get a significant improvement in the execution time for each of the computation steps. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/790_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/790_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>National United University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>07/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="wyliang@ntut.edu.tw">Wen-Yew Liang</Author>
           <Author email="tjhsieh@ntut.edu.tw">Tung-Ju Hsieh</Author>
           <Author email="t6598056@ntut.edu.tw">Muhammad T. Satria</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/v5436m2060436718/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=30">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Wen-Yew Liang,Tung-Ju Hsieh,Muhammad T. Satria,wyliang@ntut.edu.tw,tjhsieh@ntut.edu.tw,t6598056@ntut.edu.tw</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4aba234f-c87b-477d-84e4-5ccb3a641313</GUID>
        <Name>GPU-Supported Image Compression for Remote Visualization Realization and Benchmarking</Name>
        <ShortDescription>In this paper we introduce a novel GPU-supported JPEG image compression technique with a focus on its application for remote visualization purposes. Fast and high quality compression techniques are very important for the remote visualization of interactive simulations and Virtual reality applications (IS/VR) on hybrid clusters. Thus the main goals of the design and implementation of this compression technique were low compression times and nearly no visible quality loss, while achieving compression rates that allow for 30+ Frames per second over 10 MBit/s networks. To analyze the potential of the technique and further development needs and to compare it to existing methods, several benchmarks are conducted and described in this paper. Additionally a quality assessment is performed to allow statements about the achievable quality of the lossy image compression. The results show that using the GPU not only for rendering but also for image compression is a promising approach for interactive remote rendering. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/789_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/789_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Paderborn</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>12/02/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="slietsch@upb.de">Stefan Lietsch</Author>
           <Author email="plensing@upb.de">Paul Hermann Lensing</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/v1076365gx57665g/?p=98c05d32660143cdad658184818f83ac&amp;pi=28">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Stefan Lietsch,Paul Hermann Lensing,slietsch@upb.de,plensing@upb.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1968f34b-b4e7-4cfe-949e-957ac0b0a242</GUID>
        <Name>GPU-MEME: Using Graphics Hardware to Accelerate Motif Finding in DNA Sequences </Name>
        <ShortDescription>Discovery of motifs that are repeated in groups of biological sequences is a major task in bioinformatics. Iterative methods such as expectation maximization (EM) are used as a common approach to find such patterns. However, corresponding algorithms are highly compute-intensive due to the small size and degenerate nature of biological motifs. Runtime requirements are likely to become even more severe due to the rapid growth of available gene transcription data. In this paper we present a novel approach to accelerate motif discovery based on commodity graphics hardware (GPUs). To derive an efficient mapping onto this type of architecture, we have formulated the compute-intensive parts of the popular MEME tool as streaming algorithms. Our experimental results show that a single GPU allows speedups of one order of magnitude with respect to the sequential MEME implementation. Furthermore, parallelization on a GPU-cluster even improves the speedup to two orders of magnitude.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/788_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/788_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nanyang Technological University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>10/08/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="cchen@ntu.edu.sg">Chen Chen</Author>
           <Author email="asbschmidt@ntu.edu.sg">Bertil Schmidt</Author>
           <Author email="liuweiguo@ntu.edu.sg">Liu Weiguo</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/4122nv8469858582/?p=98c05d32660143cdad658184818f83ac&amp;pi=26">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Chen Chen,Bertil Schmidt,Liu Weiguo,cchen@ntu.edu.sg,asbschmidt@ntu.edu.sg,liuweiguo@ntu.edu.sg</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9dd0b45a-39ac-46d4-b174-a1e78ecab2a7</GUID>
        <Name>Performance Optimization Strategies of High Performance Computing on GPU </Name>
        <ShortDescription>Recently GPU is widely utilized in scientific computing and engineering applications, owing primarily to the evolution of GPU architecture. Firstly, we analyze some key performance characters of GPU in detail, and the relationships among GPU architecture, programming model and memory hierarchy. Secondly, we present three performance optimization strategies: Prefetching, Streamlizing, and Task Division. Adequate experiments have been done to abstract the relationships among different factors and efficiency. Finally, we map the HPL benchmark to testify our strategies and achieve certain speedup.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/787_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/787_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> National University of Defense Technology, ChangSha</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>08/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="anguo.ma@nudt.edu.cn">Anguo Ma</Author>
           <Author email="jing.cai@nudt.edu.cn">Jing Cai</Author>
           <Author email="y.cheng@nudt.edu.cn">Yu Cheng</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/b8g6p02570377572/?p=98c05d32660143cdad658184818f83ac&amp;pi=25">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Anguo Ma,Jing Cai,Yu Cheng,anguo.ma@nudt.edu.cn,jing.cai@nudt.edu.cn,y.cheng@nudt.edu.cn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>168e001f-d970-4413-90a0-8d6c90fda259</GUID>
        <Name>Bipartite Graph Matching Computation on GPU</Name>
        <ShortDescription>The Bipartite Graph Matching Problem is a well studied topic in Graph Theory. Such matching relates pairs of nodes from two distinct sets by selecting a subset of the graph edges connecting them. Each edge selected has no common node as its end points to any other edge within the subset. When the considered graph has huge sets of nodes and edges the sequential approaches are impractical, specially for applications demanding fast results. In this paper we investigate how to compute such matching on Graphics Processing Units (GPUs) motivated by its increasing processing power made available with decreasing costs. We present a new data-parallel approach for computing bipartite graph matching that is efficiently computed on todays graphics hardware and apply it to solve the correspondence between 3D samples taken over a time interval.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/786_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/786_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Leibniz Universitaet Hannover</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>08/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="crisnv@inf.puc-rio.br">Cristina Nader Vasconcelos</Author>
           <Author email="rosenhahn@tnt.uni-hannover.de">Bodo Rosenhahn</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/m7phr706x6717044/?p=98c05d32660143cdad658184818f83ac&amp;pi=24">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Cristina Nader Vasconcelos,Bodo Rosenhahn,crisnv@inf.puc-rio.br,rosenhahn@tnt.uni-hannover.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>124508be-daac-4a5e-8a7d-8bcdae9ea237</GUID>
        <Name>Face Detection Using GPU-Based Convolutional Neural Networks</Name>
        <ShortDescription>In this paper, we consider the problem of face detection under pose variations. Unlike other contributions, a focus of this work resides within efficient implementation utilizing the computational powers of modern graphics cards. The proposed system consists of a parallelized implementation of convolutional neural networks (CNNs) with a special emphasize on also parallelizing the detection process. Experimental validation in a smart conference room with 4 active ceiling-mounted cameras shows a dramatic speed-gain under real-life conditions. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/785_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/785_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>TU Dortmund University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>08/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Fabian Nasse</Author>
           <Author email="">Christian Thurau</Author>
           <Author email="">Gernot A. Fink</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/h00np133u6602613/?p=98c05d32660143cdad658184818f83ac&amp;pi=22">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Fabian Nasse,Christian Thurau,Gernot A. Fink</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>cfd6b540-64f5-423f-bc2e-1b7ec1439ba5</GUID>
        <Name>Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures </Name>
        <ShortDescription>Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallelize for GPUs due to irregular patterns of memory references. In this paper we present a new storage format for sparse matrices that better employs locality, has low memory footprint and enables automatic specialization for various matrices and future devices via parameter tuning. Experimental evaluation demonstrates significant speedups compared to previously published results.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/784_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/784_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Institute for System Programming of RAS</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>01/21/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="amonakov@ispras.ru">Alexander Monakov</Author>
           <Author email="anton@doc.ic.ac.uk">Anton Lokhmotov</Author>
           <Author email="arut@ispras.ru">Arutyun Avetisyan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/n2442u77n2333217/?p=98c05d32660143cdad658184818f83ac&amp;pi=20">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alexander Monakov,Anton Lokhmotov,Arutyun Avetisyan,amonakov@ispras.ru,anton@doc.ic.ac.uk,arut@ispras.ru</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e105e7e5-d0ca-4fe1-b6ce-897fd679d5b4</GUID>
        <Name>Searching High-Dimensional Neighbours: CPU-Based Tailored Data-Structures Versus GPU-Based Brute-Force Method</Name>
        <ShortDescription>Many image processing algorithms rely on nearest neighbor (NN) or on the k nearest neighbor (kNN) search problem. Several methods have been proposed to reduce the computation time, for instance using space partitionning. However, these methods are very slow in high dimensional space. In this paper, we propose a fast implementation of the brute-force algorithm using GPU (Graphics Processing Units) programming. We show that our implementation is up to 150 times faster than the classical approaches on synthetic data, and up to 75 times faster on real image processing algorithms (finding similar patches in images and texture synthesis). 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/783_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/783_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Palaiseau</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>05/05/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="garciav@lix.polytechnique.fr">Vincent Garcia</Author>
           <Author email="nielsen@lix.polytechnique.fr">Frank Nielsen</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/r234309v2280m17g/?p=f9a785980df7464d938d20ea0d27f629&amp;pi=19">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Vincent Garcia,Frank Nielsen,garciav@lix.polytechnique.fr,nielsen@lix.polytechnique.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>05b7c411-e33a-4038-b779-b94b67ba0e80</GUID>
        <Name>Belief Propagation Implementation Using CUDA on an NVIDIA GTX 280</Name>
        <ShortDescription>Disparity map generation is a significant component of vision-based driver assistance systems. This paper describes an efficient implementation of a belief propagation algorithm on a graphics card (GPU) using CUDA (Compute Uniform Device Architecture) that can be used to speed up stereo image processing by between 30 and 250 times. For evaluation purposes, different kinds of images have been used: reference images from the Middlebury stereo website, and real-world stereo sequences, self-recorded with the research vehicle of the .enpeda.. project at The University of Auckland. This paper provides implementation details, primarily concerned with the inequality constraints, involving the threads and shared memory, required for efficient programming on a GPU.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/780_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/780_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Shandong University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>18</ReleaseDay>
        <ReleaseDateDisplay>11/18/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yanyan Xu</Author>
           <Author email="">Hui Chen</Author>
           <Author email="">Reinhard Klette</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/k676421003802h63/?p=f9a785980df7464d938d20ea0d27f629&amp;pi=16">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yanyan Xu,Hui Chen,Reinhard Klette</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1cb185e6-e66e-458f-95d9-0f08f2490b6b</GUID>
        <Name>Lloyd's Algorithm on GPU</Name>
        <ShortDescription>The Centroidal Voronoi Diagram (CVD) is a very versatile structure, well studied in Computational Geometry. It is used as the basis for a number of applications. This paper presents a deterministic algorithm, entirely computed using graphics hardware resources, based on Lloyds Method for computing CVDs. While the computation of the ordinary Voronoi diagram on GPU is a well explored topic, its extension to CVDs presents some challenges that the present study intends to overcome. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/779_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/779_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Pontificia Universidade Catolica</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>12/02/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="crisnv@inf.puc-rio.br">Cristina N. Vasconcelos</Author>
           <Author email="asla@tecgraf.puc-rio.br">Asla Sa</Author>
           <Author email="pcezar@impa.br">Paulo Cezar Carvalho</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/qv2685448j202g58/?p=f9a785980df7464d938d20ea0d27f629&amp;pi=15">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Cristina N. Vasconcelos,Asla Sa,Paulo Cezar Carvalho,crisnv@inf.puc-rio.br,asla@tecgraf.puc-rio.br,pcezar@impa.br</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>86056069-3857-4e63-8c25-55a234a83edd</GUID>
        <Name>GPU-Accelerated Nearest Neighbor Search for 3D Registration</Name>
        <ShortDescription>Nearest Neighbor Search (NNS) is employed by many computer vision algorithms. The computational complexity is large and constitutes a challenge for real-time capability. The basic problem is in rapidly processing a huge amount of data, which is often addressed by means of highly sophisticated search methods and parallelism. We show that NNS based vision algorithms like the Iterative Closest Points algorithm (ICP) can achieve real-time capability while preserving compact size and moderate energy consumption as it is needed in robotics and many other domains. The approach exploits the concept of general purpose computation on graphics processing units (GPGPU) and is compared to parallel processing on CPU. We apply this approach to the 3D scan registration problem, for which a speed-up factor of 88 compared to a sequential CPU implementation is reported. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/778_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/778_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Sankt Augustin</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>10/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="dqiu2s@smail.inf.h-brs.de">Deyuan Qiu</Author>
           <Author email="stefan_may@arcor.de">Stefan May</Author>
           <Author email="andreas@nuechti.de">Andreas Nuchter</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/e836w4xxh5034136/?p=f9a785980df7464d938d20ea0d27f629&amp;pi=14">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Deyuan Qiu,Stefan May,Andreas Nuchter,dqiu2s@smail.inf.h-brs.de,stefan_may@arcor.de,andreas@nuechti.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>95724514-4e0b-41fc-b92d-e2c41be2c895</GUID>
        <Name>An Efficient Pre-filtering Mechanism for Parallel Intrusion Detection Based on Many-Core GPU </Name>
        <ShortDescription>Multi-pattern search is a time-consuming task in Network Intrusion Detection Systems(NIDS). The processing ability of NIDS cannot catch up with the rapid development of network bandwidth. One intuitive idea is to use pre-filtering to reduce the workload of NIDS. Our goal is to design a novel method for per-filtering which will be ready for an efficient implementation on many-core GPU. Through statistical analysis, we propose a rudimentary method to use 2B ASCII sub patterns as the filter keywords. To reduce the size of the filter keyword set, we use Binary Integer Linear Programming(BILP) for optimization. The number of filter keywords is reduced from 4824 to 362, which is also much smaller then the prefix based and suffix based method. We argue that our method can well utilize the computation power of GPU. Experiments demonstrate that our pre-filter can achieve a good fiter ratio, thus alleviate the burden of NIDS. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/777_Untitledsecuritytechnology_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/777_Untitledsecuritytechnology_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> National University of Defense Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>11/28/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="chengkun_wu@nudt.edu.cn">Chengkun Wu</Author>
           <Author email="jpyin@nudt.edu.cn">Jianping Yin</Author>
           <Author email="zpcai@nudt.edu.cn">Zhiping Cai</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/pp43w171p752678v/?p=f9a785980df7464d938d20ea0d27f629&amp;pi=11">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Chengkun Wu,Jianping Yin,Zhiping Cai,chengkun_wu@nudt.edu.cn,jpyin@nudt.edu.cn,zpcai@nudt.edu.cn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>21d4bbfd-5dd3-4016-982d-d55bab9285ed</GUID>
        <Name>GPU-based Acceleration of System-level Design Tasks </Name>
        <ShortDescription>Many system-level design tasks (e.g., high-level timing analysis, hardware/software partitioning and design space exploration) involve computational kernels that are intractable (usually NP-hard). As a result, they involve high running times even for mid-sized problems. In this paper we explore the possibility of using commodity graphics processing units (GPUs) to accelerate such tasks that commonly arise in the electronic design automation (EDA) domain. We demonstrate this idea via two detailed case studies. The first explores the possibility of using GPUs to speedup standard schedulability analysis problems. The second proposes a GPU-based engine for a general hardware/software design space exploration problem. Not only do these problems commonly arise in the embedded systems domain, their computational kernels turn out to be variants of a combinatorial optimization problem viz., the knapsack problem that lies at the heart of several EDA applications. Experimental results show that our GPU-based implementations offer very attractive speedups for the computational kernels (up to 100x), and speedups of up to 17x for the full problem. In contrast to ASIC/FPGA-based accelerators given that even low-end desktop and notebook computers are now equipped with GPUs our solution involves no extra hardware cost. Although recent research has shown the benefits of using GPUs for a variety of non-graphics applications (e.g., in databases and bioinformatics), harnessing the parallelism of GPUs to accelerate problems from the EDA domain has not been sufficiently explored so far. We believe that our results and the generality of the core problem that we address will motivate researchers from this community to explore the possibility of using GPUs for a wider variety of problems from the EDA domain. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/776_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/776_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>TU Munich</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>01/15/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Unmesh D. Bordoloi</Author>
           <Author email="">Samarjit Chakraborty</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/44324g8n140646u8/?p=f9a785980df7464d938d20ea0d27f629&amp;pi=10">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Unmesh D. Bordoloi,Samarjit Chakraborty</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d6f04307-3afc-40ed-9c71-ad0bc9456cec</GUID>
        <Name>A generic library for structured real-time computations: GPU implementation applied to retinal and cortical vision processes </Name>
        <ShortDescription>Most graphics cards in standard personal computers are now equipped with several pixel pipelines running shader programs. Taking advantage of this technology by transferring parallel computations from the CPU side to the GPU side increases the overall computational power even in non-graphical applications by freeing the main processor from an heavy work. A generic library is presented to show how anyone can benefit from modern hardware by combining various techniques with little hardware specific programming skills. Its shader implementation is applied to retinal and cortical simulation. The purpose of this sample application is not to provide a correct approximation of real center surround ganglion or middle temporal cells, but to illustrate how easily intertwined spatiotemporal filters can be applied on raw input pictures in real-time. Requirements and interconnection complexity really depend on the vision framework adopted, therefore various hypothesis that may benefit from such a library are introduced. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/775_implementation_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/775_implementation_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Toulo