<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="applications.xsl"?>
  <Applications>	

     <Application>
        <GUID>9afbcfda-88d9-44a4-9cf3-8c2e3c2ec1d9</GUID>
        <Name>Real-time virtual environment signal extraction and denoising using programmable graphics hardware </Name>
        <ShortDescription>The sense of being within a three-dimensional (3D) space and interacting with virtual 3D objects in a computer-generated virtual environment (VE) often requires essential image, vision and sensor signal processing techniques such as differentiating and denoising. This paper describes novel implementations of the Gaussian filtering for characteristic signal extraction and wavelet-based image denoising algorithms that run on the graphics processing unit (GPU). While significant acceleration over standard CPU implementations is obtained through exploiting data parallelism provided by the modern programmable graphics hardware, the CPU can be freed up to run other computations more efficiently such as artificial intelligence (AI) and physics. The proposed GPU-based Gaussian filtering can extract surface information from a real object and provide its material features for rendering and illumination. The wavelet-based signal denoising for large size digital images realized in this project provided better realism for VE visualization without sacrificing real-time and interactive performances of an application.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/835_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/835_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Huddersfield, Queensgate, Huddersfield</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>10/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="y.su@hud.ac.uk">Yang Su</Author>
           <Author email="">Zhi-Jie Xu</Author>
           <Author email="">Xiang-Qian Jiang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/w40113672g700213/?p=60db4e60c7714e7087d810ae6a83dbf5">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yang Su,Zhi-Jie Xu,Xiang-Qian Jiang,y.su@hud.ac.uk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>578067f7-ac4b-47f2-950b-1f9ed61408e5</GUID>
        <Name>Extracting Curve Skeletons from Gray Value Images for Virtual Endoscopy </Name>
        <ShortDescription>The extraction of curve skeletons from tubular networks is a necessary prerequisite for virtual endoscopy applications. We present an approach for curve skeleton extraction directly from gray value images that supersedes the need to deal with segmentations and skeletonizations. The approach uses properties of the Gradient Vector Flow to derive a tube-likeliness measure and a medialness measure. Their combination allows the detection of tubular structures and an extraction of their medial curves that stays centered also in cases where the structures are not tubular such as junctions or severe stenoses. We present results on clinical datasets and compare them to curve skeletons derived with different skeletonization approaches from high quality segmentations. Our approach achieves a high centerline accuracy and is computationally efficient by making use of a GPU based implementation of the Gradient Vector Flow.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/834_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/834_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Graz University of Technology, Austria</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>07/15/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="cbauer@icg.tu-graz.ac.at">Christian Bauer</Author>
           <Author email="bischof@icg.tu-graz.ac.at">Horst Bischof</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/69323487n4u31002/?p=fb5eb2594736451689b09cf51d6886b8">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Christian Bauer,Horst Bischof,cbauer@icg.tu-graz.ac.at,bischof@icg.tu-graz.ac.at</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>bb093c2d-d78d-4eb7-81b1-af6e59587e17</GUID>
        <Name>Evaluating the Jaccard-Tanimoto Index on Multi-core Architectures</Name>
        <ShortDescription>The Jaccard/Tanimoto coefficient is an important workload, used in a large variety of problems including drug design fingerprinting, clustering analysis, similarity web searching and image segmentation. This paper evaluates the Jaccard coefficient on three platforms: the Cell Broadband Engine processor Intel Xeon dualcore platform and NVIDIA 8800 GTX GPU. In our work, we have developed a novel parallel algorithm specially suited for the Cell/B.E. architecture for all-to-all Jaccard comparisons, that minimizes DMA transfers and reuses data in the local store. We show that our implementation on Cell/B.E. outperforms the implementations on comparable Intel platforms by 6-20X with full accuracy, and from 10-50X in reduced accuracy mode, depending on the size of the data, and by more than 60X compared to Nvidia 8800 GTX. In addition to performance, we also discuss in detail our efforts to optimize our workload on these architectures and explain how avenues for optimization on each architecture are very different and vary from one architecture to another for our workload. Our work shows that the algorithms or kernels employed for the Jaccard coefficient calculation are heavily dependent on the traits of the target hardware. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/833_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/833_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Technologies Design Center, Indianapolis</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>20</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="vsachde@us.ibm.com">Vipin Sachdeva</Author>
           <Author email="dmfreim@us.ibm.com">Douglas M. Freimuth</Author>
           <Author email="chemuell@cs.indiana.edu">Chris Mueller</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/542w421534284x43/?p=fb5eb2594736451689b09cf51d6886b8">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Vipin Sachdeva,Douglas M. Freimuth,Chris Mueller,vsachde@us.ibm.com,dmfreim@us.ibm.com,chemuell@cs.indiana.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7a0a2a4f-3fa3-4ed0-8e1c-f3dd9f2835e7</GUID>
        <Name>Focused Volumetric Visual Hull with Color Extraction</Name>
        <ShortDescription>This paper introduces a new approach for volumetric visual hull reconstruction, using a voxel grid that focuses on the moving target object. This grid is continuously updated as a function of object location, orientation, and size. The benefit is a reduced amount of voxels that have to be evaluated or allocated towards capturing the target at higher resolution. This technique particularly improves reconstructions where the total reconstruction space is larger than the moving reconstruction target. The higher resolution of the voxel grid also reduces the computational cost per voxel reprojection since a one voxel to one input pixel reprojection ratio is approximated. In addition, the appropriate view independent color of the surface voxels is computed allowing for realistic visual hull texturing. All color calculations are performed locally, based on approximated surface voxel normals and the input images. A color outlier detection approach is introduced, which reduces the influence of occlusions in the color evaluation. The parallel nature of the presented focused visual hull reconstruction technique, lends itself to hardware acceleration, allowing interactive rates to be achieved by performing most computations on the GPU. A set of case studies is provided for well-defined static and dynamic data sets.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/832_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/832_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of California, San Diego</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>11/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Daniel Knoblauch</Author>
           <Author email="">Falko Kuester</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/3652g2h32150v271/?p=fb5eb2594736451689b09cf51d6886b8">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Daniel Knoblauch,Falko Kuester</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>89f67b35-e35e-4e8c-8b22-14c848a66f32</GUID>
        <Name>Fourier Volume Rendering on GPGPU</Name>
        <ShortDescription>Fourier Volume Rendering (FVR) is a volume rendering technique with lower computational complexity of O(N 2 logN) for an N 3 data array. A new FVR algorithm is proposed through expanding Fourier Projection-Slice Theorem into High-Dimension and mapping the pipeline totally on GPU. A windowed-sinc function is used as reconstruction filter to implement higher-order interpolation and reduction of samples is executed on GPU in parallel, which meets the architecture of Heterogeneous multi-core. The rendering is accelerated by a factor of 7 when rendering image’s resolution is larger than 512x512.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/831_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/831_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Hunan University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>05/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Degui Xiao</Author>
           <Author email="">Yi Liu</Author>
           <Author email="">Lei Yang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/5548u3274r1517u7/?p=fb5eb2594736451689b09cf51d6886b8">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Degui Xiao,Yi Liu,Lei Yang</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e7dc92ba-1736-4c6f-92ea-6ac559d565f7</GUID>
        <Name>Practical Random Linear Network Coding on GPUs</Name>
        <ShortDescription>Recently, random linear network coding has been widely applied in peer-to-peer network applications. Instead of sharing the raw data with each other, peers in the network produce and send encoded data to each other. As a result, the communication protocols have been greatly simplified, and the applications experience higher end-to-end throughput and better robustness to network churns.Since it is difficult to verify the integrity of the encoded data, such systems can suffer from the famous pollution attack, in which a malicious node can send bad encoded blocks that consist of bogus data. Consequently, the bogus data will be propagated into the whole network at an exponential rate. Homomorphic hash functions (HHFs) have been designed to defend systems from such pollution attacks, but with a new challenge: HHFs require that network coding must be performed in GF(q), where q is a very large prime number. This greatly increases the computational cost of network coding, in addition to the already computational expensive HHFs. This paper exploits the potential of the huge computing power of Graphic Processing Units (GPUs) to reduce the computational cost of network coding and homomorphic hashing. With our network coding and HHF implementation on GPU, we observed significant computational speedup in comparison with the best CPU implementation. This implementation can lead to a practical solution for defending against the pollution attacks in distributed systems.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/830_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/830_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Hong Kong Baptist University / University of Calgary, Alberta, Canada</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>05/07/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="chxw@comp.hkbu.edu.hk">Xiaowen Chu</Author>
           <Author email="kyzhao@comp.hkbu.edu.hk">Kaiyong Zhao</Author>
           <Author email="meawang@ucalgary.ca">Mea Wang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/12r8mj83g5655542/?p=fb5eb2594736451689b09cf51d6886b">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Xiaowen Chu,Kaiyong Zhao,Mea Wang,chxw@comp.hkbu.edu.hk,kyzhao@comp.hkbu.edu.hk,meawang@ucalgary.ca</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>207cd764-e884-47aa-b0c3-b5505bedfbe4</GUID>
        <Name>Fast Conjugate Gradients with Multiple GPUs</Name>
        <ShortDescription>The limiting factor for efficiency of sparse linear solvers is the memory bandwidth. In this work, we describe a fast Conjugate Gradient solver for unstructured problems, which runs on multiple GPUs installed on a single mainboard. The solver achieves double precision accuracy with single precision GPUs, using a mixed precision iterative refinement algorithm. To achieve high computation speed, we propose a fast sparse matrix-vector multiplication algorithm, which is the core operation of iterative solvers. The proposed multiplication algorithm efficiently utilizes GPU resources via caching, coalesced memory accesses and load balance between running threads. Experiments on wide range of matrices show that our matrix-vector multiplication algorithm achieves up to 11.6 Gflops on single GeForce 8800 GTS card and CG implementation achieves up to 24.6 Gflops with four GPUs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/829_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/829_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Tokyo Institute of Technology / National Institute of Informatics, Japan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ali@matsulab.is.titech.ac.jp">Ali Cevahir</Author>
           <Author email="nukada@matsulab.is.titech.ac.jp">Akira Nukada</Author>
           <Author email="matsu@is.titech.ac.jp">Satoshi Matsuoka</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/9m742203qp7802m7/?p=fb5eb2594736451689b09cf51d6886b">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ali Cevahir,Akira Nukada,Satoshi Matsuoka,ali@matsulab.is.titech.ac.jp,nukada@matsulab.is.titech.ac.jp,matsu@is.titech.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>307d80ab-1016-4ba9-9fda-be6f1e85a18f</GUID>
        <Name>Applying the Stream-Based Computing Model to Design Hardware Accelerators: A Case Study</Name>
        <ShortDescription>To facilitate the design of hardware accelerators we propose in this paper the adoption of the stream-based computing model and the usage of Graphics Processing Units (GPUs) as prototyping platforms. This model exposes the maximum data parallelism available in the applications and decouples computation from memory accesses. The design and implementation procedures, including the programming of GPUs, are illustrated with the widely used MrBayes bioinformatics application. Experimental results show that a straightforward mapping of the stream-based program for the GPU into hardware structures leads to improvements in performance, scalability and cost. Moreover, it is shown that a set of simple optimization techniques can be applied in order to reduce the cost, and the power consumption of hardware solutions. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/828_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/828_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Rua Alves Redol</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>07/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="fcpp@inesc-id.pt">Frederico Pratas</Author>
           <Author email="las@inesc-id.pt">Leonel Sousa</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/6720653366867q70/?p=fb5eb2594736451689b09cf51d6886b8">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Frederico Pratas,Leonel Sousa,fcpp@inesc-id.pt,las@inesc-id.pt</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>24d9dfbe-430a-4065-b835-69d1728e3a2b</GUID>
        <Name>Parallel Calculating of the Goal Function in Metaheuristics Using GPU</Name>
        <ShortDescription>We consider a metaheuristic optimization algorithm which uses single process (thread) to guide the search through the solution space. Thread performs in the cyclic way (iteratively) two main tasks: the goal function evaluation for a single solution or a set of solutions and management (solution filtering and selection, collection of history, updating). The latter task takes statistically 1-3% total iteration time, therefore we skip its acceleration as useless. The former task can be accelerated in parallel environments in various manners. We propose certain parallel small-grain calculation model providing the cost optimal method. Then, we carry out an experiment using Graphics Processing Unit (GPU) to confirm our theoretical results. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/827_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/827_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Wroc³aw University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="wojciech.bozejko@pwr.wroc.pl">Wojciech Bozejko</Author>
           <Author email="czeslaw.smutnicki@pwr.wroc.pl">Czes'aw Smutnicki</Author>
           <Author email="mariusz.uchronski@pwr.wroc.pl">Mariusz Uchroñski</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/fp7l0800u7715872/?p=fb5eb2594736451689b09cf51d6886b8">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Wojciech Bozejko,Czes'aw Smutnicki,Mariusz Uchroñski,wojciech.bozejko@pwr.wroc.pl,czeslaw.smutnicki@pwr.wroc.pl,mariusz.uchronski@pwr.wroc.pl</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>89537d32-f563-4d80-af24-b3b43058d026</GUID>
        <Name>Accelerating astrophysical particle simulations with programmable hardware (FPGA and GPU) </Name>
        <ShortDescription>In a previous paper we have shown that direct gravitational N-body simulations in astrophysics scale very well for moderately parallel supercomputers (order 10–100 nodes). The best balance between computation and communication is reached if the nodes are accelerated by special purpose hardware; in this paper we describe the implementation of particle based astrophysical simulation codes on new types of accelerator hardware (field programmable gate arrays, FPGA, and graphical processing units, GPU). In addition to direct gravitational N-body simulations we also use the algorithmically similar “smoothed particle hydrodynamics” method as test application; the algorithms are used for astrophysical problems as e.g. evolution of galactic nuclei with central black holes and gravitational wave generation, and star formation in galaxies and galactic nuclei. We present the code performance on a single node using different kinds of special hardware (traditional GRAPE, FPGA, and GPU) and some implementation aspects (e.g. accuracy). The results show that GPU hardware for real application codes is as fast as GRAPE, but for an order of magnitude lower price, and that FPGA is useful for acceleration of complex sequences of operations (like SPH). We discuss future prospects and new cluster computers built with new generations of FPGA and GPU cards. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/826_implementation_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/826_implementation_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Heidelberg</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>05/12/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="spurzem@ari.uni-heidelberg.de">R. Spurzem</Author>
           <Author email="berczik@ari.uni-heidelberg.de">P. Berczik</Author>
           <Author email="guillermo.marcus@ziti.uni-heidelberg.de">G. Marcus</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/ew838w1334511061/?p=933dcabf38454d089551d5a476fca08c">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>R. Spurzem,P. Berczik,G. Marcus,spurzem@ari.uni-heidelberg.de,berczik@ari.uni-heidelberg.de,guillermo.marcus@ziti.uni-heidelberg.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a9e38eba-e87f-426b-916a-5c33b9f69177</GUID>
        <Name>A framework for exploring numerical solutions of advection–reaction–diffusion equations using a GPU-based approach</Name>
        <ShortDescription>In this paper we describe a general purpose, graphics processing unit (GP-GPU)-based approach for solving partial differential equations (PDEs) within advection–reaction–diffusion models. The GP-GPU-based approach provides a platform for solving PDEs in parallel and can thus significantly reduce solution times over traditional CPU implementations. This allows for a more efficient exploration of various advection–reaction–diffusion models, as well as, the parameters that govern them. Although the GPU does impose limitations on the size and accuracy of computations, the PDEs describing the advection–reaction–diffusion models of interest to us fit comfortably within these constraints. Furthermore, the GPU technology continues to rapidly increase in speed, memory, and precision, thus applying these techniques to larger systems should be possible in the future. We chose to solve the PDEs using two numerical approaches: for the diffusion, a first-order explicit forward Euler solution and a semi-implicit second order Crank–Nicholson solution; and, for the advection and reaction, a first-order explicit solution. The goal of this work is to provide motivation and guidance to the application scientist interested in exploring the use of the GP-GPU computational framework in the course of their research. In this paper, we present a rigorous comparison of our GPU-based advection–reaction–diffusion code model with a CPU-based analog, finding that the GPU model out-performs the CPU implementation in one-to-one comparisons. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/825_computedvisualation_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/825_computedvisualation_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Utah</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>04</ReleaseDay>
        <ReleaseDateDisplay>03/04/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="allen@sci.utah.edu">Allen R. Sanderson</Author>
           <Author email="miriah@sci.utah.edu">Miriah D. Meyer</Author>
           <Author email="kirby@sci.utah.edu">Robert M. Kirby</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/t4pj83q74k7h3534/?p=933dcabf38454d089551d5a476fca08c">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Allen R. Sanderson,Miriah D. Meyer,Robert M. Kirby,allen@sci.utah.edu,miriah@sci.utah.edu,kirby@sci.utah.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7d3cc29a-3dac-4791-8478-77dd28708ea8</GUID>
        <Name>Going Forward with GPU Computing</Name>
        <ShortDescription>This article describes why CEA is looking at GPU Computing and how the first experiments are conducted. We describe here a well defined global strategy which relies on training users and taking advantage of Grand Challenges, involving early access users and system administrators. We also describe some preliminary results and raise questions which need to be addressed in the near future.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/824_highperformancecomputing_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/824_highperformancecomputing_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>CEA, DAM</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>10/07/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Guillaume Colin de Verdiere</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/h71v663783rx85g7/?p=933dcabf38454d089551d5a476fca08c">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Guillaume Colin de Verdiere</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>441d4d84-f548-4465-ac76-eef36ff2a059</GUID>
        <Name>Introduction to Mastering Cell BE and GPU Execution Platforms </Name>
        <ShortDescription>Both Cell BE-type and GPU processors have emerged as multi-processor execution platforms that can outperform general purpose multi-core computers in certain application domains. The two architectures are quite different, and by no means interchangeable. GPUs are reminiscent of fine-grained systolic array architectures, while the Cell BE is suitable to execute a set of co-ordinated coarse-grained tasks. By now, enough applications have been mapped on either of these two processors, mostly by hand, that the pros and cons tables can be filled. The next step is to provide mappings that are based on efficient programming models and methods, in particular methods that minimize communication overheads. The six papers in this special session are attempts to take precisely that route. Three of them are taking the GPU as the underlying execution platform, the third taking also the Cell-BE multicore processor into consideration. The other three papers are targetting the Cell-BE processor. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/823_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/823_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Leiden University, the Netherlands</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>07/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Ed Deprettere</Author>
           <Author email="">Ana L. Varbanescu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/087047416q2k63k0/?p=617f22391ecf47f89a3da0c82420ae97">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ed Deprettere,Ana L. Varbanescu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8949e7e3-c9b6-487a-894e-75c35f7b8d45</GUID>
        <Name>Development of a GPU-based multithreaded software application to calculate digitally reconstructed radiographs for radiotherapy</Name>
        <ShortDescription>To provide faster calculation of digitally reconstructed radiographs (DRRs) in patient-positioning verification, we developed and evaluated a graphic processing unit (GPU)-based DRR software application and compared it with a central processing unit (CPU)-based application. The evaluation metrics were calculation speed and image quality for various slice thicknesses. The results showed that the GPU-based DRR computation was an average of 50 times faster than the CPU-based methodology, whereas the image quality was very similar. This excellent performance may increase the accuracy of patient positioning and improve the patient treatment throughput time</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/822_radialogics_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/822_radialogics_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>National Institute of Radiological Sciences, Japan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>11/07/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="shinshin@nirs.go.jp">Shinichiro Mori</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/v57t4uj446138427/?p=617f22391ecf47f89a3da0c82420ae97">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Shinichiro Mori,shinshin@nirs.go.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>6884796d-0fa2-4f33-9297-1fde62fcc824</GUID>
        <Name>Lattice Boltzmann based PDE solver on the GPU</Name>
        <ShortDescription>In this paper, we propose a hardware-accelerated PDE (partial differential equation) solver based on the lattice Boltzmann model (LBM). The LBM is initially designed to solve fluid dynamics by constructing simplified microscopic kinetic models. As an explicit numerical scheme with only local operations, it has the advantage of being easy to implement and especially suitable for graphics hardware (GPU) acceleration. Beyond the Navier–Stokes equation of fluid mechanics, a typical LBM can be modified to solve the parabolic diffusion equation, which is further used to solve the elliptic Laplace and Poisson equations with a diffusion process. These PDEs are widely used in modeling and manipulating images, surfaces and volumetric data sets. Therefore, the LBM scheme can be used as an GPU-based numerical solver to provide a fast and convenient alternative to traditional implicit iterative solvers. We apply this method to several examples in volume smoothing, surface fairing and image editing, achieving outstanding performance on contemporary graphics hardware. It has the great potential to be used as a general GPU computing framework for efficiently solving PDEs in image processing, computer graphics and visualization. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/821_visualcomputer_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/821_visualcomputer_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Kent State University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>12/07/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="zhao@cs.kent.edu">Ye Zhao</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/l8x284048269263x/?p=617f22391ecf47f89a3da0c82420ae97">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ye Zhao,zhao@cs.kent.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1acdf9de-8761-4f13-9fea-7b8b02b55719</GUID>
        <Name>Real-Time Online Video Object Silhouette Extraction Using Graph Cuts on the GPU</Name>
        <ShortDescription>Being able to find the silhouette of an object is a very important front-end processing step for many high-level computer vision techniques, such as Shape-from-Silhouette 3D reconstruction methods, object shape tracking, and pose estimation. Graph cuts have been proposed as a method for finding very accurate silhouettes which can be used as input to such high level techniques, but graph cuts are notoriously computation intensive and slow. Leading CPU implementations can extract a silhouette from a single QVGA image in 100 milliseconds, with performance dramatically decreasing with increased resolution. Recent GPU implementations have been able to achieve performance of 6 milliseconds per image by exploiting the intrinsic properties of the lattice graphs and the hardware model of the GPU. However, these methods are restricted to a subclass of lattice graphs and are not generally applicable. We propose a novel method for graph cuts on the GPU which places no limits on graph configuration and which is able to achieve comparable real-time performance in online video processing scenarios. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/820_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/820_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Keio University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>08/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="zgarrett@hvrl.ics.keio.ac.jp">Zachary A. Garrett</Author>
           <Author email="saito@hvrl.ics.keio.ac.jp">Hideo Saito</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/928267731044g820/?p=617f22391ecf47f89a3da0c82420ae97">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Zachary A. Garrett,Hideo Saito,zgarrett@hvrl.ics.keio.ac.jp,saito@hvrl.ics.keio.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1919e879-ecaa-471f-b6cb-93415638c16a</GUID>
        <Name>Seeded ND medical image segmentation by cellular automaton on GPU</Name>
        <ShortDescription>Purpose  We present a GPU-based framework to perform organ segmentation in N-dimensional (ND) medical image datasets by computation of weighted distances using the Ford–Bellman algorithm (FBA). Our GPU implementation of FBA gives an alternative and optimized solution to other graph-based segmentation techniques.</ShortDescription>
        <URL>http://springerlink.com/content/v92w2q820w412jj8/?p=617f22391ecf47f89a3da0c82420ae97&amp;pi=63</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/819_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/819_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Notre-Dame Hospital</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>07/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="claude.kauffmann@gmail.com">Claude Kauffmann</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/v92w2q820w412jj8/?p=617f22391ecf47f89a3da0c82420ae97&amp;pi=63">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Claude Kauffmann,claude.kauffmann@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4bd610d3-92f8-4032-9730-02b0e6091d1f</GUID>
        <Name>On GPU's viability as a middleware accelerator </Name>
        <ShortDescription>Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible cost-effective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing these highly-parallel devices to support more generic functionality at the operating system or middleware level. This study starts from the hypothesis that generic middleware-level techniques that improve distributed system reliability or performance (such as content addressing, erasure coding, or data similarity detection) can be significantly accelerated using GPU support. We take a first step towards validating this hypothesis and we design StoreGPU, a library that accelerates a number of hashing-based middleware primitives popular in distributed storage system implementations. Our evaluation shows that StoreGPU enables up twenty five fold performance gains on synthetic benchmarks as well as on a high-level application: the online similarity detection between large data files. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/818_scalable_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/818_scalable_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of British Columbia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>01/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="samera@ece.ubc.ca">Samer Al-Kiswany</Author>
           <Author email="abdullah@ece.ubc.ca">Abdullah Gharaibeh</Author>
           <Author email="elizeus@ece.ubc.ca">Elizeu Santos-Neto</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/8260x51q6440v403/?p=617f22391ecf47f89a3da0c82420ae97&amp;pi=62">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Samer Al-Kiswany,Abdullah Gharaibeh,Elizeu Santos-Neto,samera@ece.ubc.ca,abdullah@ece.ubc.ca,elizeus@ece.ubc.ca</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>03286d23-be49-45d1-be2b-790c02badee7</GUID>
        <Name>Implementing Decision Trees and Forests on a GPU</Name>
        <ShortDescription>We describe a method for implementing the evaluation and training of decision trees and forests entirely on a GPU, and show how this method can be used in the context of object recognition. Our strategy for evaluation involves mapping the data structure describing a decision forest to a 2D texture array. We navigate through the forest for each point of the input data in parallel using an efficient, non-branching pixel shader. For training, we compute the responses of the training data to a set of candidate features, and scatter the responses into a suitable histogram using a vertex shader. The histograms thus computed can be used in conjunction with a broad range of tree learning algorithms. 
</ShortDescription>
        <URL>http://springerlink.com/content/y702n504831g232m/?p=617f22391ecf47f89a3da0c82420ae97&amp;pi=61</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/817_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/817_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Microsoft Research, Cambridge, UK</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>10/12/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="toby.sharp@microsoft.com">Toby Sharp</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/y702n504831g232m/?p=617f22391ecf47f89a3da0c82420ae97&amp;pi=61">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Toby Sharp,toby.sharp@microsoft.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e4fd34a1-868c-482b-9522-41104b157431</GUID>
        <Name>CUDAMat</Name>
        <ShortDescription>CUDAMat provides a CUDA-based matrix class for Python, making it easy to implement algorithms that are easily expressed in terms of dense linear algebra. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/816_google_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/816_google_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Toronto</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>11/30/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>50</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="vmnih@cs.toronto.edu">Volodymyr Mnih</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/cudamat/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Libraries</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Volodymyr Mnih,vmnih@cs.toronto.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0692da9d-1f32-4819-a7e6-278383b1c438</GUID>
        <Name>Parallelization of a Video Segmentation Algorithm on CUDA–Enabled Graphics Processing Units</Name>
        <ShortDescription>Nowadays, Graphics Processing Units (GPU) are emerging as SIMD coprocessors for general purpose computations, specially after the launch of nVIDIA CUDA. Since then, some libraries have been implemented for matrix computation and image processing. However, in real video applications some stages need irregular data distributions and the parallelism is not so inherent. This paper presents the parallelization of a video segmentation application on GPU hardware, which implements an algorithm for abrupt and gradual transitions detection. A critical part of the algorithm requires highly intensive computation for video frames features calculation. Results on three CUDA-enabled GPUs are encouraging, because of the significant speedup achieved. They are also compared with an OpenMP version of the algorithm, running on two platforms with multiples cores.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/815_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/815_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Cordoba, Spain / University of Malaga, Spain</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>08/22/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="el1goluj@uco.es">Juan Gómez-Luna</Author>
           <Author email="gonzalez@ac.uma.es">Jose Maria Gonzalez-Linares</Author>
           <Author email="el1bebej@uco.es">Jose Ignacio Benavides</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/d76622215h42m733/?p=f1707317a6624bd9afd08d7a9739c995&amp;pi=56">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Juan Gómez-Luna,Jose Maria Gonzalez-Linares,Jose Ignacio Benavides,el1goluj@uco.es,gonzalez@ac.uma.es,el1bebej@uco.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ae4da9b0-398e-4b88-ad64-95c879d6e61f</GUID>
        <Name>Fast and automatic object pose estimation for range images on the GPU</Name>
        <ShortDescription>We present a pose estimation method for rigid objects from single range images. Using 3D models of the objects, many pose hypotheses are compared in a data-parallel version of the downhill simplex algorithm with an image-based error function. The pose hypothesis with the lowest error value yields the pose estimation (location and orientation), which is refined using ICP. The algorithm is designed especially for implementation on the GPU. It is completely automatic, fast, robust to occlusion and cluttered scenes, and scales with the number of different object types. We apply the system to bin picking, and evaluate it on cluttered scenes. Comprehensive experiments on challenging synthetic and real-world data demonstrate the effectiveness of our method. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/814_implementation_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/814_implementation_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Inha University, Korea</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>04</ReleaseDay>
        <ReleaseDateDisplay>08/04/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="pik@inha.ac.kr">In Kyu Park</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/q4723815w714n2xr/?p=f1707317a6624bd9afd08d7a9739c995&amp;pi=53">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>In Kyu Park,pik@inha.ac.kr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>6a9ef568-5517-4b74-b3d0-0070e8b2ab21</GUID>
        <Name>MinGPU: a minimum GPU library for computer vision</Name>
        <ShortDescription>In the field of computer vision, it is becoming increasingly popular to implement algorithms, in sections or in their entirety, on a graphics processing unit (GPU). This is due to the superior speed GPUs offer compared to CPUs. In this paper, we present a GPU library, MinGPU, which contains all of the necessary functions to convert an existing CPU code to GPU. We have created GPU implementations of several well known computer vision algorithms, including the homography transformation between two 3D views. We provide timing charts and show that our MinGPU implementation of homography transformations performs approximately 600 times faster than its C++ CPU implementation.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/813_iss_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/813_iss_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Central Florida</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>05/28/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="pavelb@cs.ucf.edu">Pavel Babenko</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/1164314511225480/?p=f1707317a6624bd9afd08d7a9739c995&amp;pi=51">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Pavel Babenko,pavelb@cs.ucf.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>38ad061e-364d-40a7-8e42-1233c587d56e</GUID>
        <Name>GPU Accelerated Non-rigid Registration for the Evaluation of Cardiac Function</Name>
        <ShortDescription>We present a method for the fast and efficient tracking of motion in cardiac magnetic resonance (CMR) cines. A GPU accelerated Levenberg-Marquardt non-linear least squares optimization procedure for finite element non-rigid registration was implemented on an NVIDIA graphics card using the OpenGL environment. Points were tracked from frame to frame using forward and backward incremental registration. The inner (endocardial) and outer (epicardial) boarders of the heart were tracked in six short axis cines with ~25 frames through the cardiac cycle in 36 patients with vascular disease. Contours placed by two independent expert observers using a semi-automatic ventricular analysis program (CIM version 4.6) were used as the gold standard. The method took 0.5 seconds per frame, and the maximum Hausdorff errors were less than 2 mm on average which was of the same order as the expert inter-observer error. In conclusion, GPU accelerated Levenberg-Marquardt non-linear optimization enables fast and accurate tracking of cardiac motion in CMR images.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/812_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/812_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Auckland</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>10/30/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="b.li@auckland.ac.nz">Bo Li</Author>
           <Author email="a.young@auckland.ac.nz">Alistair A. Young</Author>
           <Author email="b.cowan@auckland.ac.nz">Brett R. Cowan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/46j8v0r7070470m3/?p=1cbbf7d42868493da8e612a3b97202f9&amp;pi=49">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bo Li,Alistair A. Young,Brett R. Cowan,b.li@auckland.ac.nz,a.young@auckland.ac.nz,b.cowan@auckland.ac.nz</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>22e19a72-ca7f-4e5d-a9d9-fbe3cbb38d5c</GUID>
        <Name>A Hybrid Parallel Signature Matching Model for Network Security Applications Using SIMD GPU</Name>
        <ShortDescription>High performance signature matching against a large dictionary is of great importance in network security applications. The many-core SIMD GPU is a competitive choice for signature matching. In this paper, a hybrid parallel signature matching model (HPSMM) using SIMD GPU is proposed, which uses pattern set partition and input text partition together. Then the problem of load balancing for multiprocessors in the GPU is discussed carefully, and a balanced pattern set partition method (BPSPM) employed in HPSMM is introduced. Experiments demonstrate that using pattern set partition and input text partition together can help achieve a better performance, and the proposed BPSPM-Length works well in load balancing. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/811_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/811_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>National University of Defense Technology, China</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>08/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="chengkun_wu@nudt.edu.cn">Chengkun Wu</Author>
           <Author email="jpyin@nudt.edu.cn">Jianping Yin</Author>
           <Author email="zpcai@nudt.edu.cn">Zhiping Cai</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/k5x363617412j441/?p=1cbbf7d42868493da8e612a3b97202f9&amp;pi=46">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Chengkun Wu,Jianping Yin,Zhiping Cai,chengkun_wu@nudt.edu.cn,jpyin@nudt.edu.cn,zpcai@nudt.edu.cn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>89d9f616-d298-43a4-99e1-3fe1db248cba</GUID>
        <Name>Parallel 3D Image Segmentation of Large Data Sets on a GPU Cluster </Name>
        <ShortDescription>In this paper, we propose an inherent parallel scheme for 3D image segmentation of large volume data on a GPU cluster. This method originates from an extended Lattice Boltzmann Model (LBM), and provides a new numerical solution for solving the level set equation. As a local, explicit and parallel scheme, our method lends itself to several favorable features: (1) Very easy to implement with the core program only requiring a few lines of code; (2) Implicit computation of curvatures; (3) Flexible control of generating smooth segmentation results; (4) Strong amenability to parallel computing, especially on low-cost, powerful graphics hardware (GPU). The parallel computational scheme is well suited for cluster computing, leading to a good solution for segmenting very large data sets. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/810_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/810_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Kent State University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>11/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Aaron Hagan</Author>
           <Author email="">Ye Zhao</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/dv45r171t1027355/?p=1cbbf7d42868493da8e612a3b97202f9&amp;pi=45">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Aaron Hagan,Ye Zhao</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9e70b216-1271-4886-be56-fe79e2bb7ea9</GUID>
        <Name>Computing the Longest Common Transposition-Invariant Subsequence with GPU</Name>
        <ShortDescription>Finding a longest common transposition-invariant subsequence (LCTS) of two given integer sequences A&#8201;=&#8201;a 1 a 2...a m and B&#8201;=&#8201;b 1 b 2...b n (a generalization of the well-known longest common subsequence problem (LCS)) has arisen in the field of music information retrieval. In the LCTS problem, we look for an LCS for the sequences A&#8201;+&#8201;t&#8201;=&#8201;(a 1&#8201;+&#8201;t)(a 2&#8201;+&#8201;t)...(a m &#8201;+&#8201;t) and B where t is any integer. Performance of the top graphical processing units (GPUs) outgrew the performance of the top CPUs a few years ago and there is a surge of interest in recent years in using GPUs for general processing.We propose and evaluate a bit-parallel algorithm solving the LCTS problem on a GPU. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/809_Untitledsecuritytechnology_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/809_Untitledsecuritytechnology_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Silesian University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>10/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="sebastian.deorowicz@polsl.pl">Sebastian Deorowicz</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/e5084324pj884338/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=39">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sebastian Deorowicz,sebastian.deorowicz@polsl.pl</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>58db5b29-d3e0-4e9a-975a-d39dfd48e727</GUID>
        <Name>Real-Time GPU-Based Voxel Carving with Systematic Occlusion Handling</Name>
        <ShortDescription>We present an approach to compute the visual hulls of multiple people in real-time in the presence of occlusions. We prove that the resulting visual hulls are correct and minimal under occlusions. Our proposed algorithm runs completely on the GPU with framerates up to 50fps for multiple people using only one computer equipped with off-the-shelf hardware. We also compare runtimes for different graphic chips and show that our approach scales very well without additional effort. Comparison to other work shows that our algorithm is as fast as state-of-the-art technology. The resulting visual hulls can be the basis for a wide range of algorithms that require a robust voxel representation as input. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/808_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/808_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Fraunhofer IITB Karlsruhe / Universitat Karlsruhe </OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>09/02/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="alexander.schick@iitb.fraunhofer.de">Alexander Schick</Author>
           <Author email="rainer.stiefelhagen@iitb.fraunhofer.de">Rainer Stiefelhagen</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/m2212r130316g534/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=36">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alexander Schick,Rainer Stiefelhagen,alexander.schick@iitb.fraunhofer.de,rainer.stiefelhagen@iitb.fraunhofer.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f4504157-17b0-4b17-9476-d48e77994f7f</GUID>
        <Name>Arion Render</Name>
        <ShortDescription>Physically-based unbiased rendering</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/807_arion_cuda_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/807_arion_cuda_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>RandomControl S.L.U.</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>04/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>50</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="tech@randomcontrol.com">RandomControl</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.randomcontrol.com/arion">Application</ContentType>
           <ContentType url="http://www.randomcontrol.com/arion">Multimedia</ContentType>
           <ContentType url="http://www.randomcontrol.com/arion">Presentation</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Raytracing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>raytracing rendering physically-based unbiased randomcontrol arion fryrender,RandomControl,tech@randomcontrol.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1a492908-0605-4d9e-af4f-085ff724e6cf</GUID>
        <Name>Asymmetric Distributed Shared Memory</Name>
        <ShortDescription>GMAC is a run-time system that implements an Asymmetric Disitributed Shared Memory model. This model eases the task of programming CUDA applications by building a unified global address space including system and GPU memories. Code executed at the CPU can transparently access data hosted by the GPU memory, but code run at the GPU is constrained to access the data hosted by its memory. GMAC removes the need to perform explicit data transfers using cudaMemcpy() calls and handles all data transfers in a transparent and efficient way. Moreover, the unified address space implemented by GMAC allows using CPU pointers in the GPU code.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/806_google_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/806_google_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universitat Politecnica de Catalunya / University of Illinois</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>11/02/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="igelado@ac.upc.edu">Isaac Gelado</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/adsm/">Application</ContentType>
           <ContentType url="http://code.google.com/p/adsm/">Presentation</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Library</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Isaac Gelado,igelado@ac.upc.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ef51a1b4-1fff-412e-a96d-796a24015f38</GUID>
        <Name>Octane Renderer</Name>
        <ShortDescription>Octane Render is a fully GPU-powered, un-biased and physically based rendering application, with a 10-15X speed increase over un-biased CPU based renderers</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/806_octane_cuda_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/806_octane_cuda_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Refractive Software LTD</OrganizationName>
        <OrganizationURL>http://www.refractivesoftware.com</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>01/10/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>15</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">Refractive Software LTD</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.refractivesoftware.com/purchase.html">Application</ContentType>
           <ContentType url="http://www.refractivesoftware.com/videos.html">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Video &amp; Audio</ApplicationType>
           <ApplicationType>Graphics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Refractive Software LTD</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>554c3825-b0de-4df9-bd68-f0dba7b2a590</GUID>
        <Name>Textbook: GPU</Name>
        <ShortDescription>Chinese text book for CUDA programing</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/803_20100202044228595_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/803_20100202044228595_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>www.hpctech.com</OrganizationName>
        <OrganizationURL>http://www.hpctech.com/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>10/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="zhao.kaiyong@gmail.com">Shu Zhang</Author>
           <Author email="">Yanli Chu</Author>
           <Author email="">Kaiyong Zhao</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.hpctech.com/announce/?announceid=2">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>HPC information</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Shu Zhang,Yanli Chu,Kaiyong Zhao,zhao.kaiyong@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a62c5428-2955-4cf3-9d0d-0078b395153f</GUID>
        <Name>QView</Name>
        <ShortDescription>Multi-math object viewer . Still under development.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/802_qview_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/802_qview_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>digitker - The digital kernel</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>04/30/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="dtsonov@digitker.com">Dimitar Tsonov</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://digitker.com/">Paper</ContentType>
           <ContentType url="http://digitker.com/">Presentation</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Finance</ApplicationType>
           <ApplicationType>Game Physics</ApplicationType>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Libraries</ApplicationType>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>math kernel viewer </ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Dimitar Tsonov,dtsonov@digitker.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ed7975e2-60da-449a-8a34-febfbd08eebf</GUID>
        <Name>Textbook: Programming Massively Parallel Processors: A Hands-on Approach</Name>
        <ShortDescription>The first textbook of its kind, Programming Massively Parallel Processors: A Hands-on Approach is authored by Dr. David B. Kirk, NVIDIA Fellow and former chief scientist, and Dr. Wen-mei Hwu, who serves at the University of Illinois at Urbana-Champaign as Chair of Electrical and Computer Engineering in the Coordinated Science Laboratory, co-director of the Universal Parallel Computing Research Center and principal investigator of the CUDA Center of Excellence. The textbook, which is 256 pages, is the first aimed at teaching advanced students and professionals the basic concepts of parallel programming and GPU architectures. Published by Morgan Kaufmann, it explores various techniques for constructing parallel programs and reviews numerous case studies. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/801_Kirk-HR_large_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/801_Kirk-HR_large_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>NVIDIA and UIUC</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2011</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>01/28/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="dkirk@nvidia.com">Dr. David Kirk</Author>
           <Author email="">Dr. Wen-meiHwu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.nvidia.com/object/io_1264656303008.html">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Progamming textbook</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>CUDA, Parallel Processing, NVIDIA, GPU,Dr. David Kirk,Dr. Wen-meiHwu,dkirk@nvidia.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c8e8ac46-4a7f-47db-b7d6-b79ae238ba7d</GUID>
        <Name>PARRET: Parellel RestoreTools</Name>
        <ShortDescription>PARRET is a Python package for image deblurring on GPUs. By making use of the parallelism on NVIDIA GPU CUDA architecture, the deblurring time is greatly reduced. Besides image deblurring, PARRET can be used to solve linear equations.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/800_demo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/800_demo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Emory University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>02/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>15</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="yfan@emory.edu">Ying Wai (Daniel) Fan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.mathcs.emory.edu/~yfan/PARRET/doc/index.html">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>deblurring, Python, linear systems of equations,Ying Wai (Daniel) Fan,yfan@emory.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3192f565-72ab-4885-9348-2b3afd2511d6</GUID>
        <Name>QUDA : A library for QCD on GPUs</Name>
        <ShortDescription>QUDA is a library for performing calculations in lattice QCD on graphics processing units (GPUs) using NVIDIA's C for CUDA API. The current release includes optimized kernels for applying the Wilson Dirac operator and clover-improved Wilson Dirac operator, kernels for performing various BLAS-like operations, and full inverters built on these kernels. Mixed-precision implementations of both CG and BiCGstab are provided, with support for double, single, and half (16-bit fixed-point) precision.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/799_quda_image_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/799_quda_image_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Boston University and Harvard University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>11/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="mikec@seas.harvard.edu">M. A. Clark</Author>
           <Author email="rbabich@bu.edu">R. Babich</Author>
           <Author email="kbarros@gmail.com">K. Barros</Author>
           <Author email="brower@bu.edu">R. Brower</Author>
           <Author email="rebbi@bu.edu">C. Rebbi</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://lattice.bu.edu/quda">Application</ContentType>
           <ContentType url="http://arxiv.org/abs/0911.3191">Paper</ContentType>
           <ContentType url="http://lattice.bu.edu/quda">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>QCD, linear solver, mixed precision,Mike Clark,mikec@seas.harvard.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>14721042-0396-4060-8731-199cc53e5bc2</GUID>
        <Name>SCGPSim: A fast SystemC simulator on GPUs</Name>
        <ShortDescription>The main objective of this paper is to speed up the simulation performance of SystemC designs at the RTL abstraction level by exploiting the high degree of parallelism afforded by today's general purpose graphics processors (GPGPUs). Our approach parallelizes SystemC's discrete-event simulation (DES) on GPGPUs by transforming the model of computation of DES into a model of concurrent threads that synchronize as and when necessary. Our simulation infrastructure is called SCGPSim and it includes a source-to-source (S2S) translator to transform synthesizable SystemC models into parallelly executable programs targeting an NVIDIA GPU. The translator retains the simulation semantics of the original designs by applying semantics preserving transformations. The resulting transformed models mapped onto the massively parallel architecture of GPUs improve simulation efficiency quite substantially. Preliminary experiments with varying-sized examples such as AES, ALU, and FIR have shown simulation speed-ups ranging from 30x to 100x. Considering that our transformations are not yet optimized, we believe that optimizing them will improve the simulation performance even further.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/798_scgp2_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/798_scgp2_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>FERMAT Lab, Virginia Tech, Blacksburg, VA</OrganizationName>
        <OrganizationURL>http://www.fermat.ece.vt.edu/</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>01/19/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>100</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="knmahesh@vt.edu">Mahesh Nanjundappa</Author>
           <Author email="">Hiren D Patel</Author>
           <Author email="">Bijoy A Jose</Author>
           <Author email="">Sandeep K Shukla</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://filebox.vt.edu/users/knmahesh/index_files/mahesh_scgpsim.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Electronic Design Automation</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mahesh Nanjundappa,Hiren D Patel,Bijoy A Jose,knmahesh@vt.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>128f6237-5801-4d4f-b825-fc3a01ba1578</GUID>
        <Name>Myocyte Simulation</Name>
        <ShortDescription>Code performes several time-step simulations of a Myocyte (heart muscle cell) in parallel, allowing to obtain results for different set of inputs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/797_Myocyte_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/797_Myocyte_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Virginia</OrganizationName>
        <OrganizationURL>http://www.virginia.edu</OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>01/31/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="lgs9a@virginia.edu">Lukasz G. Szafaryn</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Myocyte">Application</ContentType>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Myocyte">Multimedia</ContentType>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Myocyte">Paper</ContentType>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Myocyte">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Life Sciences</ApplicationType>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Simulation</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>myocyte, simulation, ode solving, time-step,Lukasz G. Szafaryn,lgs9a@virginia.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ab039cd4-07bd-419e-b6b0-a2e7e7be3fec</GUID>
        <Name>Mutual Information Based Semi-Global Stereo Matching on the GPU </Name>
        <ShortDescription>Real-time stereo matching is necessary for many practical applications, including robotics. There are already many real-time stereo systems, but they typically use local approaches that cause object boundaries to be blurred and small objects to be removed. We have selected the Semi-Global Matching (SGM) method for implementation on graphics hardware, because it can compete with the currently best global stereo methods. At the same time, it is much more efficient than most other methods that produce a similar quality. In contrast to previous work, we have fully implemented SGM including matching with mutual information, which is partly responsible for the high quality of disparity images. Our implementation reaches 4.2 fps on a GeForce 8800 ULTRA with images of 640 x480 pixel size and 128 pixel disparity range and 13 fps on images of 320 x240 pixel size and 64 pixel disparity range. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/796_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/796_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>German Aerospace Center </OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>12/02/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ines.ernst@dlr.de">Ines Ernst</Author>
           <Author email="heiko.hirschmueller@dlr.de">Heiko Hirschmuller</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/m12112614k7834g4/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=35">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ines Ernst,Heiko Hirschmuller,ines.ernst@dlr.de,heiko.hirschmueller@dlr.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4f1c26e4-bd49-4db3-9e21-65632e62b00d</GUID>
        <Name>Experiences with Cell-BE and GPU for Tomography</Name>
        <ShortDescription>Tomography is a powerful technique for three-dimensional imaging, that deals with image reconstruction from a series of projection images, acquired along a range of viewing directions. An important part of any tomograph system is the reconstruction algorithm. Iterative reconstruction algorithms have many advantages over non-iterative methods, yet their running time can be prohibitively long. As these algorithms have high potential for parallelization, multi-core architectures, such as the Cell-BE and GPU, can possibly alleviate this problem. 
In this paper, we describe our experiences in mapping the basic operations of iterative reconstruction algorithms onto these platforms. We argue that for this type of problem, the GPU yields superior performance compared to the Cell-BE. Performance results of our implementation demonstrate a speedup of over 40 for a single GPU, compared to a single-core CPU version. By combining eight GPUs and a quad-core CPU in a single system, similar performance to a large cluster consisting of hundreds of CPU cores has been obtained. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/795_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/795_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> University of Antwerp, Belgium</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>07/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>40</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="Sander.vanderMaar@ua.ac.be">Sander van der Maar</Author>
           <Author email="Joost.Batenburg@ua.ac.be">Kees Joost Batenburg</Author>
           <Author email="Jan.Sijbers@ua.ac.be">Jan Sijbers</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/9362637125n513j6/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=34">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sander van der Maar,Kees Joost Batenburg,Jan Sijbers,Sander.vanderMaar@ua.ac.be,Joost.Batenburg@ua.ac.be,Jan.Sijbers@ua.ac.be</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9256c867-a33e-4bca-8dd1-f56c21b6047b</GUID>
        <Name>Experiences with Cell-BE and GPU for Tomography</Name>
        <ShortDescription>Tomography is a powerful technique for three-dimensional imaging, that deals with image reconstruction from a series of projection images, acquired along a range of viewing directions. An important part of any tomograph system is the reconstruction algorithm. Iterative reconstruction algorithms have many advantages over non-iterative methods, yet their running time can be prohibitively long. As these algorithms have high potential for parallelization, multi-core architectures, such as the Cell-BE and GPU, can possibly alleviate this problem. 
In this paper, we describe our experiences in mapping the basic operations of iterative reconstruction algorithms onto these platforms. We argue that for this type of problem, the GPU yields superior performance compared to the Cell-BE. Performance results of our implementation demonstrate a speedup of over 40 for a single GPU, compared to a single-core CPU version. By combining eight GPUs and a quad-core CPU in a single system, similar performance to a large cluster consisting of hundreds of CPU cores has been obtained. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/793_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/793_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> University of Antwerp, Belgium</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>07/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>40</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="Sander.vanderMaar@ua.ac.be">Sander van der Maar</Author>
           <Author email="Joost.Batenburg@ua.ac.be">Kees Joost Batenburg</Author>
           <Author email="Jan.Sijbers@ua.ac.be">Jan Sijbers</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/9362637125n513j6/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=34">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sander van der Maar,Kees Joost Batenburg,Jan Sijbers,Sander.vanderMaar@ua.ac.be,Joost.Batenburg@ua.ac.be,Jan.Sijbers@ua.ac.be</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4ad94310-447d-47c8-bd18-1a36ddda8728</GUID>
        <Name>Multi-walk Parallel Pattern Search Approach on a GPU Computing Platform </Name>
        <ShortDescription>This paper studies the efficiency of using Pattern Search (PS) on bound constrained optimization functions on a Graphics Processing Unit (GPU) computing platform. Pattern Search is a direct search optimization technique that does not require derivative information on non-linear programming problems. Pattern Search is ideally suited to a GPU computing environment due to its low memory requirement and no communication between threads in a multi-walk setting. To adapt to a GPU environment, traditional Pattern Search is modified by terminating based on iterations instead of tolerance. This research designed and implemented a multi-walk Pattern Search algorithm on a GPU computing platform. Computational results are promising with a computing speedup of 100+ compared to a corresponding implementation on a single CPU. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/792_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/792_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Lamar University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="Weihang.Zhu@lamar.edu">Weihang Zhu</Author>
           <Author email="jcurry@my.lamar.edu">James Curry</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/d655105451757237/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=31">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Weihang Zhu,James Curry,Weihang.Zhu@lamar.edu,jcurry@my.lamar.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1ecce826-a4da-4bd6-932e-11130eeee781</GUID>
        <Name>A GPU-Based Simulation of Tsunami Propagation and Inundation</Name>
        <ShortDescription>Tsunami simulation consists of fluid dynamics, numerical computations, and visualization techniques. Nonlinear shallow water equations are often used to model the tsunami propagation. By adding the friction slope to the conservation of momentum, it also can model the tsunami inundation. To solve these equations, we use the second order finite difference MacCormack method. Since it is a finite difference method, it brings the possibility to be parallelized. We use the parallelism provided by GPU to speed up the computations. By loading data as textures in GPU memory, the computation processes can be written as shader programs and the operations will be done by GPU in parallel. The results show that with the help of GPU, the simulation can get a significant improvement in the execution time for each of the computation steps. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/790_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/790_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>National United University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>07/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="wyliang@ntut.edu.tw">Wen-Yew Liang</Author>
           <Author email="tjhsieh@ntut.edu.tw">Tung-Ju Hsieh</Author>
           <Author email="t6598056@ntut.edu.tw">Muhammad T. Satria</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/v5436m2060436718/?p=f8db6074671c4838bb1501c6d9e20c5d&amp;pi=30">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Wen-Yew Liang,Tung-Ju Hsieh,Muhammad T. Satria,wyliang@ntut.edu.tw,tjhsieh@ntut.edu.tw,t6598056@ntut.edu.tw</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4aba234f-c87b-477d-84e4-5ccb3a641313</GUID>
        <Name>GPU-Supported Image Compression for Remote Visualization Realization and Benchmarking</Name>
        <ShortDescription>In this paper we introduce a novel GPU-supported JPEG image compression technique with a focus on its application for remote visualization purposes. Fast and high quality compression techniques are very important for the remote visualization of interactive simulations and Virtual reality applications (IS/VR) on hybrid clusters. Thus the main goals of the design and implementation of this compression technique were low compression times and nearly no visible quality loss, while achieving compression rates that allow for 30+ Frames per second over 10 MBit/s networks. To analyze the potential of the technique and further development needs and to compare it to existing methods, several benchmarks are conducted and described in this paper. Additionally a quality assessment is performed to allow statements about the achievable quality of the lossy image compression. The results show that using the GPU not only for rendering but also for image compression is a promising approach for interactive remote rendering. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/789_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/789_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Paderborn</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>12/02/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="slietsch@upb.de">Stefan Lietsch</Author>
           <Author email="plensing@upb.de">Paul Hermann Lensing</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/v1076365gx57665g/?p=98c05d32660143cdad658184818f83ac&amp;pi=28">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Stefan Lietsch,Paul Hermann Lensing,slietsch@upb.de,plensing@upb.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1968f34b-b4e7-4cfe-949e-957ac0b0a242</GUID>
        <Name>GPU-MEME: Using Graphics Hardware to Accelerate Motif Finding in DNA Sequences </Name>
        <ShortDescription>Discovery of motifs that are repeated in groups of biological sequences is a major task in bioinformatics. Iterative methods such as expectation maximization (EM) are used as a common approach to find such patterns. However, corresponding algorithms are highly compute-intensive due to the small size and degenerate nature of biological motifs. Runtime requirements are likely to become even more severe due to the rapid growth of available gene transcription data. In this paper we present a novel approach to accelerate motif discovery based on commodity graphics hardware (GPUs). To derive an efficient mapping onto this type of architecture, we have formulated the compute-intensive parts of the popular MEME tool as streaming algorithms. Our experimental results show that a single GPU allows speedups of one order of magnitude with respect to the sequential MEME implementation. Furthermore, parallelization on a GPU-cluster even improves the speedup to two orders of magnitude.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/788_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/788_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nanyang Technological University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>10/08/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="cchen@ntu.edu.sg">Chen Chen</Author>
           <Author email="asbschmidt@ntu.edu.sg">Bertil Schmidt</Author>
           <Author email="liuweiguo@ntu.edu.sg">Liu Weiguo</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/4122nv8469858582/?p=98c05d32660143cdad658184818f83ac&amp;pi=26">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Chen Chen,Bertil Schmidt,Liu Weiguo,cchen@ntu.edu.sg,asbschmidt@ntu.edu.sg,liuweiguo@ntu.edu.sg</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9dd0b45a-39ac-46d4-b174-a1e78ecab2a7</GUID>
        <Name>Performance Optimization Strategies of High Performance Computing on GPU </Name>
        <ShortDescription>Recently GPU is widely utilized in scientific computing and engineering applications, owing primarily to the evolution of GPU architecture. Firstly, we analyze some key performance characters of GPU in detail, and the relationships among GPU architecture, programming model and memory hierarchy. Secondly, we present three performance optimization strategies: Prefetching, Streamlizing, and Task Division. Adequate experiments have been done to abstract the relationships among different factors and efficiency. Finally, we map the HPL benchmark to testify our strategies and achieve certain speedup.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/787_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/787_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> National University of Defense Technology, ChangSha</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>08/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="anguo.ma@nudt.edu.cn">Anguo Ma</Author>
           <Author email="jing.cai@nudt.edu.cn">Jing Cai</Author>
           <Author email="y.cheng@nudt.edu.cn">Yu Cheng</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/b8g6p02570377572/?p=98c05d32660143cdad658184818f83ac&amp;pi=25">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Anguo Ma,Jing Cai,Yu Cheng,anguo.ma@nudt.edu.cn,jing.cai@nudt.edu.cn,y.cheng@nudt.edu.cn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>168e001f-d970-4413-90a0-8d6c90fda259</GUID>
        <Name>Bipartite Graph Matching Computation on GPU</Name>
        <ShortDescription>The Bipartite Graph Matching Problem is a well studied topic in Graph Theory. Such matching relates pairs of nodes from two distinct sets by selecting a subset of the graph edges connecting them. Each edge selected has no common node as its end points to any other edge within the subset. When the considered graph has huge sets of nodes and edges the sequential approaches are impractical, specially for applications demanding fast results. In this paper we investigate how to compute such matching on Graphics Processing Units (GPUs) motivated by its increasing processing power made available with decreasing costs. We present a new data-parallel approach for computing bipartite graph matching that is efficiently computed on todays graphics hardware and apply it to solve the correspondence between 3D samples taken over a time interval.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/786_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/786_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Leibniz Universitaet Hannover</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>08/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="crisnv@inf.puc-rio.br">Cristina Nader Vasconcelos</Author>
           <Author email="rosenhahn@tnt.uni-hannover.de">Bodo Rosenhahn</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/m7phr706x6717044/?p=98c05d32660143cdad658184818f83ac&amp;pi=24">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Cristina Nader Vasconcelos,Bodo Rosenhahn,crisnv@inf.puc-rio.br,rosenhahn@tnt.uni-hannover.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>124508be-daac-4a5e-8a7d-8bcdae9ea237</GUID>
        <Name>Face Detection Using GPU-Based Convolutional Neural Networks</Name>
        <ShortDescription>In this paper, we consider the problem of face detection under pose variations. Unlike other contributions, a focus of this work resides within efficient implementation utilizing the computational powers of modern graphics cards. The proposed system consists of a parallelized implementation of convolutional neural networks (CNNs) with a special emphasize on also parallelizing the detection process. Experimental validation in a smart conference room with 4 active ceiling-mounted cameras shows a dramatic speed-gain under real-life conditions. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/785_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/785_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>TU Dortmund University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>08/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Fabian Nasse</Author>
           <Author email="">Christian Thurau</Author>
           <Author email="">Gernot A. Fink</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/h00np133u6602613/?p=98c05d32660143cdad658184818f83ac&amp;pi=22">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Fabian Nasse,Christian Thurau,Gernot A. Fink</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>cfd6b540-64f5-423f-bc2e-1b7ec1439ba5</GUID>
        <Name>Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures </Name>
        <ShortDescription>Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallelize for GPUs due to irregular patterns of memory references. In this paper we present a new storage format for sparse matrices that better employs locality, has low memory footprint and enables automatic specialization for various matrices and future devices via parameter tuning. Experimental evaluation demonstrates significant speedups compared to previously published results.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/784_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/784_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Institute for System Programming of RAS</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>01/21/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="amonakov@ispras.ru">Alexander Monakov</Author>
           <Author email="anton@doc.ic.ac.uk">Anton Lokhmotov</Author>
           <Author email="arut@ispras.ru">Arutyun Avetisyan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/n2442u77n2333217/?p=98c05d32660143cdad658184818f83ac&amp;pi=20">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alexander Monakov,Anton Lokhmotov,Arutyun Avetisyan,amonakov@ispras.ru,anton@doc.ic.ac.uk,arut@ispras.ru</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e105e7e5-d0ca-4fe1-b6ce-897fd679d5b4</GUID>
        <Name>Searching High-Dimensional Neighbours: CPU-Based Tailored Data-Structures Versus GPU-Based Brute-Force Method</Name>
        <ShortDescription>Many image processing algorithms rely on nearest neighbor (NN) or on the k nearest neighbor (kNN) search problem. Several methods have been proposed to reduce the computation time, for instance using space partitionning. However, these methods are very slow in high dimensional space. In this paper, we propose a fast implementation of the brute-force algorithm using GPU (Graphics Processing Units) programming. We show that our implementation is up to 150 times faster than the classical approaches on synthetic data, and up to 75 times faster on real image processing algorithms (finding similar patches in images and texture synthesis). 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/783_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/783_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Palaiseau</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>05/05/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="garciav@lix.polytechnique.fr">Vincent Garcia</Author>
           <Author email="nielsen@lix.polytechnique.fr">Frank Nielsen</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/r234309v2280m17g/?p=f9a785980df7464d938d20ea0d27f629&amp;pi=19">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Vincent Garcia,Frank Nielsen,garciav@lix.polytechnique.fr,nielsen@lix.polytechnique.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>05b7c411-e33a-4038-b779-b94b67ba0e80</GUID>
        <Name>Belief Propagation Implementation Using CUDA on an NVIDIA GTX 280</Name>
        <ShortDescription>Disparity map generation is a significant component of vision-based driver assistance systems. This paper describes an efficient implementation of a belief propagation algorithm on a graphics card (GPU) using CUDA (Compute Uniform Device Architecture) that can be used to speed up stereo image processing by between 30 and 250 times. For evaluation purposes, different kinds of images have been used: reference images from the Middlebury stereo website, and real-world stereo sequences, self-recorded with the research vehicle of the .enpeda.. project at The University of Auckland. This paper provides implementation details, primarily concerned with the inequality constraints, involving the threads and shared memory, required for efficient programming on a GPU.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/780_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/780_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Shandong University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>18</ReleaseDay>
        <ReleaseDateDisplay>11/18/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yanyan Xu</Author>
           <Author email="">Hui Chen</Author>
           <Author email="">Reinhard Klette</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/k676421003802h63/?p=f9a785980df7464d938d20ea0d27f629&amp;pi=16">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yanyan Xu,Hui Chen,Reinhard Klette</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1cb185e6-e66e-458f-95d9-0f08f2490b6b</GUID>
        <Name>Lloyd's Algorithm on GPU</Name>
        <ShortDescription>The Centroidal Voronoi Diagram (CVD) is a very versatile structure, well studied in Computational Geometry. It is used as the basis for a number of applications. This paper presents a deterministic algorithm, entirely computed using graphics hardware resources, based on Lloyds Method for computing CVDs. While the computation of the ordinary Voronoi diagram on GPU is a well explored topic, its extension to CVDs presents some challenges that the present study intends to overcome. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/779_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/779_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Pontificia Universidade Catolica</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>12/02/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="crisnv@inf.puc-rio.br">Cristina N. Vasconcelos</Author>
           <Author email="asla@tecgraf.puc-rio.br">Asla Sa</Author>
           <Author email="pcezar@impa.br">Paulo Cezar Carvalho</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/qv2685448j202g58/?p=f9a785980df7464d938d20ea0d27f629&amp;pi=15">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Cristina N. Vasconcelos,Asla Sa,Paulo Cezar Carvalho,crisnv@inf.puc-rio.br,asla@tecgraf.puc-rio.br,pcezar@impa.br</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>86056069-3857-4e63-8c25-55a234a83edd</GUID>
        <Name>GPU-Accelerated Nearest Neighbor Search for 3D Registration</Name>
        <ShortDescription>Nearest Neighbor Search (NNS) is employed by many computer vision algorithms. The computational complexity is large and constitutes a challenge for real-time capability. The basic problem is in rapidly processing a huge amount of data, which is often addressed by means of highly sophisticated search methods and parallelism. We show that NNS based vision algorithms like the Iterative Closest Points algorithm (ICP) can achieve real-time capability while preserving compact size and moderate energy consumption as it is needed in robotics and many other domains. The approach exploits the concept of general purpose computation on graphics processing units (GPGPU) and is compared to parallel processing on CPU. We apply this approach to the 3D scan registration problem, for which a speed-up factor of 88 compared to a sequential CPU implementation is reported. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/778_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/778_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Sankt Augustin</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>10/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="dqiu2s@smail.inf.h-brs.de">Deyuan Qiu</Author>
           <Author email="stefan_may@arcor.de">Stefan May</Author>
           <Author email="andreas@nuechti.de">Andreas Nuchter</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/e836w4xxh5034136/?p=f9a785980df7464d938d20ea0d27f629&amp;pi=14">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Deyuan Qiu,Stefan May,Andreas Nuchter,dqiu2s@smail.inf.h-brs.de,stefan_may@arcor.de,andreas@nuechti.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>95724514-4e0b-41fc-b92d-e2c41be2c895</GUID>
        <Name>An Efficient Pre-filtering Mechanism for Parallel Intrusion Detection Based on Many-Core GPU </Name>
        <ShortDescription>Multi-pattern search is a time-consuming task in Network Intrusion Detection Systems(NIDS). The processing ability of NIDS cannot catch up with the rapid development of network bandwidth. One intuitive idea is to use pre-filtering to reduce the workload of NIDS. Our goal is to design a novel method for per-filtering which will be ready for an efficient implementation on many-core GPU. Through statistical analysis, we propose a rudimentary method to use 2B ASCII sub patterns as the filter keywords. To reduce the size of the filter keyword set, we use Binary Integer Linear Programming(BILP) for optimization. The number of filter keywords is reduced from 4824 to 362, which is also much smaller then the prefix based and suffix based method. We argue that our method can well utilize the computation power of GPU. Experiments demonstrate that our pre-filter can achieve a good fiter ratio, thus alleviate the burden of NIDS. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/777_Untitledsecuritytechnology_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/777_Untitledsecuritytechnology_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> National University of Defense Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>11/28/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="chengkun_wu@nudt.edu.cn">Chengkun Wu</Author>
           <Author email="jpyin@nudt.edu.cn">Jianping Yin</Author>
           <Author email="zpcai@nudt.edu.cn">Zhiping Cai</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/pp43w171p752678v/?p=f9a785980df7464d938d20ea0d27f629&amp;pi=11">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Chengkun Wu,Jianping Yin,Zhiping Cai,chengkun_wu@nudt.edu.cn,jpyin@nudt.edu.cn,zpcai@nudt.edu.cn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>21d4bbfd-5dd3-4016-982d-d55bab9285ed</GUID>
        <Name>GPU-based Acceleration of System-level Design Tasks </Name>
        <ShortDescription>Many system-level design tasks (e.g., high-level timing analysis, hardware/software partitioning and design space exploration) involve computational kernels that are intractable (usually NP-hard). As a result, they involve high running times even for mid-sized problems. In this paper we explore the possibility of using commodity graphics processing units (GPUs) to accelerate such tasks that commonly arise in the electronic design automation (EDA) domain. We demonstrate this idea via two detailed case studies. The first explores the possibility of using GPUs to speedup standard schedulability analysis problems. The second proposes a GPU-based engine for a general hardware/software design space exploration problem. Not only do these problems commonly arise in the embedded systems domain, their computational kernels turn out to be variants of a combinatorial optimization problem viz., the knapsack problem that lies at the heart of several EDA applications. Experimental results show that our GPU-based implementations offer very attractive speedups for the computational kernels (up to 100x), and speedups of up to 17x for the full problem. In contrast to ASIC/FPGA-based accelerators given that even low-end desktop and notebook computers are now equipped with GPUs our solution involves no extra hardware cost. Although recent research has shown the benefits of using GPUs for a variety of non-graphics applications (e.g., in databases and bioinformatics), harnessing the parallelism of GPUs to accelerate problems from the EDA domain has not been sufficiently explored so far. We believe that our results and the generality of the core problem that we address will motivate researchers from this community to explore the possibility of using GPUs for a wider variety of problems from the EDA domain. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/776_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/776_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>TU Munich</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>01/15/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Unmesh D. Bordoloi</Author>
           <Author email="">Samarjit Chakraborty</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/44324g8n140646u8/?p=f9a785980df7464d938d20ea0d27f629&amp;pi=10">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Unmesh D. Bordoloi,Samarjit Chakraborty</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d6f04307-3afc-40ed-9c71-ad0bc9456cec</GUID>
        <Name>A generic library for structured real-time computations: GPU implementation applied to retinal and cortical vision processes </Name>
        <ShortDescription>Most graphics cards in standard personal computers are now equipped with several pixel pipelines running shader programs. Taking advantage of this technology by transferring parallel computations from the CPU side to the GPU side increases the overall computational power even in non-graphical applications by freeing the main processor from an heavy work. A generic library is presented to show how anyone can benefit from modern hardware by combining various techniques with little hardware specific programming skills. Its shader implementation is applied to retinal and cortical simulation. The purpose of this sample application is not to provide a correct approximation of real center surround ganglion or middle temporal cells, but to illustrate how easily intertwined spatiotemporal filters can be applied on raw input pictures in real-time. Requirements and interconnection complexity really depend on the vision framework adopted, therefore various hypothesis that may benefit from such a library are introduced. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/775_implementation_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/775_implementation_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Toulouse</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>01/08/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="quinton@n7.fr">Jean-Charles Quinton</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/7r18j54177721672/?p=50a4e541be74482a9189e8d90f128a35&amp;pi=9">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jean-Charles Quinton,quinton@n7.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4844c2e1-42ea-446c-aa46-616f14577bf2</GUID>
        <Name>GPU Accelerated 3D Face Registration / Recognition</Name>
        <ShortDescription>This paper proposes a novel approach to both registration and recognition of face in three dimensions. The presented method is based on normal map metric to perform either the alignment of captured face to a reference template or the comparison between any two faces in a gallery. As the metric involved is highly suited to be computed via vector processor, we propose an implementation of the whole framework on last generation graphics boards, to exploit the potential of GPUs applied to large scale biometric identification applications. This work shows how the use of affordable consumer grade hardware could allow ultra rapid comparison between face descriptors through their highly specialized architecture. The approach also addresses facial expression changes by means of a subject specific weighting masks. We include preliminary results of experiments conducted on a proprietary gallery and on a subset of FRGC database. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/774_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/774_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universita degli Studi di Salerno</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>08/30/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="abate@unisa.it">Andrea Francesco Abate</Author>
           <Author email="mnappi@unisa.it">Michele Nappi</Author>
           <Author email="sricciardi@unisa.it">Stefano Ricciardi</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/y76n586210377175/?p=50a4e541be74482a9189e8d90f128a35&amp;pi=7">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Andrea Francesco Abate,Michele Nappi,Stefano Ricciardi,abate@unisa.it,mnappi@unisa.it,sricciardi@unisa.it</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ffcd384a-918f-49f4-a5ad-b21b0988e948</GUID>
        <Name>Implementation of a Lattice Boltzmann method for numerical fluid mechanics using the NVIDIA CUDA technology </Name>
        <ShortDescription>The Lattice Boltzmann method (LBM) is a distribution-function based approach to numerical fluid mechanics. Due to the simple formulation of the underlying algorithm this method is well suited for parallelization and hardware acceleration using general purpose graphical processing units (GPGPU). Within this work LBM has been implemented in a new code with multi-GPU support and physically validated for a flow around a sphere. The performance analysis shows a remarkable speed-up of 1840% using 3 GPUs in comparison to a single socket multi core CPU calculation. Moreover the validation for the test case chosen shows excellent agreement with available reference data. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/773_implementation_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/773_implementation_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Technische Universitat Munchen</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>06</ReleaseDay>
        <ReleaseDateDisplay>05/06/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="Thomas.Indinger@tum.de">T. Indinger</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/417774437h7h0462/?p=50a4e541be74482a9189e8d90f128a35&amp;pi=6">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>T. Indinger,Thomas.Indinger@tum.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b4a2cd2c-ab54-4f4e-a597-12943d456da4</GUID>
        <Name>GPU-Assisted Surface Reconstruction on Locally-Uniform Samples</Name>
        <ShortDescription>In point-based graphics, surfaces are represented by point clouds without explicit connectivity. If the distribution of the points can be carefully controlled, surface reconstruction becomes a much easier problem. We present a simple, completely local surface reconstruction algorithm for input point distributions that are locally uniform. The locality of the computation lets us handle large point sets using parallel and out-of-core methods. The algorithm can be implemented robustly with floating-point arithmetic. We demonstrate the simplicity, efficiency, and numerical stability of our algorithm with an out-of-core and parallel implementation using graphics hardware. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/772_roundtable_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/772_roundtable_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of California</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>23</ReleaseDay>
        <ReleaseDateDisplay>10/23/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="kil@cs.ucdavis.edu">Yong Joo Kil</Author>
           <Author email="amenta@cs.ucdavis.edu">Nina Amenta</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/lr611t0051862418/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yong Joo Kil,Nina Amenta,kil@cs.ucdavis.edu,amenta@cs.ucdavis.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fdd9c3bb-420d-4e67-94a0-60174e2f4534</GUID>
        <Name>GP-GPU Implementation of the Local Rank Differences Image Feature </Name>
        <ShortDescription>A currently popular trend in object detection and pattern recognition is usage of statistical classifiers, namely AdaBoost and its modifications. The speed performance of these classifiers largely depends on the low level image features they are using: both on the amount of information the feature provides and the processor time of its evaluation. Local Rank Differences is an image feature that is alternative to commonly used haar wavelets. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware, but -as this paper shows -it performs very well on graphics hardware (GPU) used in general purpose manner (GPGPU, namely CUDA in this case) as well. The paper discusses the LRD features and their properties, describes an experimental implementation of the LRD in graphics hardware using CUDA, presents its empirical performance measures compared to alter native approaches, suggests several notes on practical usage of LRD and proposes directions for future work. 
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/771_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/771_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Brno University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>05/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="fherout@fit.vutbr.cz">Adam Herout</Author>
           <Author email="ijosth@fit.vutbr.cz">Radovan Josth</Author>
           <Author email="zemcik@fit.vutbr.cz">Pavel Zemcik</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://springerlink.com/content/266r885ql19154vl/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Adam Herout,Radovan Josth,Pavel Zemcik,fherout@fit.vutbr.cz,ijosth@fit.vutbr.cz,zemcik@fit.vutbr.cz</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>307a1055-9c6c-4df0-bc88-96f461322333</GUID>
        <Name>AES Encryption Implementation and Analysis on Commodity Graphics Processing Units</Name>
        <ShortDescription>Graphics Processing Units (GPUs) present large potential performance gains within stream processing applications over the standard CPU. These performance gains are best realised when high computational intensity is required across large amounts of mostly independent input elements. The GPUs success in general purpose stream processing has been demonstrated in many diverse fields, though attempts to port cryptographic algorithms to the GPU have thus far met little success. In recent years, GPU architectures have continued to develop a more flexible and uniform programming environment. These developments have overcome a lot of previously encountered restrictions in cipher implementations. We present novel approaches for the implementation of the AES block cipher encryption algorithm on these GPUs. This work also serves as a precursor for future cipher implementations on the most advanced GPU architecture, the recently released Nvidia G80, which now includes integer support and a simplified programming interface.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/770_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/770_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Trinity College Dublin</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>23</ReleaseDay>
        <ReleaseDateDisplay>08/23/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="harrisoo@cs.tcd.ie">Owen Harrison</Author>
           <Author email="john.waldron@cs.tcd.ie">John Waldron</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/v8010ju818508326/?p=5b871216a9454c458c81943a856b862f&amp;pi=89">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Owen Harrison,John Waldron,harrisoo@cs.tcd.ie,john.waldron@cs.tcd.ie</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>819a8581-877a-45c8-9cfb-d63121d5dbe2</GUID>
        <Name>The Future of Volume Graphics in Medical Virtual Reality</Name>
        <ShortDescription>A recent trend in medical virtual reality is to include information from multiple sources, especially about physiology, into one model and one single visualization. Computer graphics must therefore deal with a huge amount of information in real time. The latest developments in computer graphics hardware allow not only implementing direct volume rendering on the graphics processing unit (GPU). The emerging compute languages enable us to address volume rendering problems of arbitrary complexity without being limited to formulating visualization techniques in an awkward fashion to match the GPU execution model. Utilizing the arising new possibilities we meet next generations demands in medical visualization.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/769_prediction_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/769_prediction_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Graz University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>01/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Judith Muehl</Author>
           <Author email="">Bernhard Kainz</Author>
           <Author email="">Alexander Bornik</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/t6n2076l1184u556/?p=5b871216a9454c458c81943a856b862f&amp;pi=87">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Judith Muehl,Bernhard Kainz,Alexander Bornik</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f327d71f-b539-441f-a3d5-fc8b66c264db</GUID>
        <Name>Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by NVIDIA</Name>
        <ShortDescription>In this article a very efficient implementation of a 2D-Lattice Boltzmann kernel using the Compute Unified Device Architecture (CUDA) interface developed by nVIDIA is presented. By exploiting the explicit parallelism exposed in the graphics hardware we obtain more than one order in performance gain compared to standard CPUs. A non-trivial example, the flow through a generic porous medium, shows the performance of the implementation.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/768_bottle_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/768_bottle_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>TU Braunschweig</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>07/24/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="toelke@cab.bau.tu-bs.de">Jonas Tolke</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/80743363113w4w2w/?p=5b871216a9454c458c81943a856b862f&amp;pi=83">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jonas Tolke,toelke@cab.bau.tu-bs.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>36f123f2-0612-42e2-8134-d637453033c5</GUID>
        <Name>GPU in Haptic Rendering of Deformable Objects</Name>
        <ShortDescription>We present some results regarding utilizing Graphics Processing Unit (GPU) for computing the deformation of two experimental objects. A suture simulation model with GPU and a 2D deformable cloth model with nVidia CUDA techniques are also proposed. We conducted experimental studies to compare the GPU-based suture models and with the CPU implementation. We also experimented with the implicit model of the 2D mesh which offer similar computational challenges associated with any Finite-Element modeling approaches. A method for computing the inverse of a matrix with truncated Neumann series is also introduced.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/767_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/767_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Simon Fraser University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>06/28/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="fuhans@cs.sfu.ca">Hans Fuhan Shi</Author>
           <Author email="shahram@cs.sfu.ca">Shahram Payandeh</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/76791014h8m67716/?p=5b871216a9454c458c81943a856b862f&amp;pi=81">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Hans Fuhan Shi,Shahram Payandeh,fuhans@cs.sfu.ca,shahram@cs.sfu.ca</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>aa90451a-b028-44e9-98ae-84677865270f</GUID>
        <Name>GP-GPU Implementation of the Local Rank Differences Image Feature</Name>
        <ShortDescription>A currently popular trend in object detection and pattern recognition is usage of statistical classifiers, namely AdaBoost and its modifications. The speed performance of these classifiers largely depends on the low level image features they are using: both on the amount of information the feature provides and the processor time of its evaluation. Local Rank Differences is an image feature that is alternative to commonly used haar wavelets. It is suitable for implementation in programmable (FPGA) or specialized (ASIC) hardware, but -as this paper shows -it performs very well on graphics hardware (GPU) used in general purpose manner (GPGPU, namely CUDA in this case) as well. The paper discusses the LRD features and their properties, describes an experimental implementation of the LRD in graphics hardware using CUDA, presents its empirical performance measures compared to alter native approaches, suggests several notes on practical usage of LRD and proposes directions for future work.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/766_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/766_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> Brno University of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>05/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Adam Herout</Author>
           <Author email="">Radovan Josth</Author>
           <Author email="">Pavel Zemcik</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/266r885ql19154vl/?p=5b871216a9454c458c81943a856b862f&amp;pi=80">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Adam Herout,Radovan Josth,Pavel Zemcik</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>41e47d70-074c-4ef5-a17f-ba467f8e9d78</GUID>
        <Name>Monte Carlo Dose Calculation using GPU-Based parallel processing</Name>
        <ShortDescription>Recently, it became possible to operate physical phenomenon using Graphics Processing Unit (GPU), and Monte Carlo calculation methods came to be researched about shortening the computing time using GPU positively. This report shows how to significantly accelerate 3D dose calculation of photon beam using Graphics Processing Unit (GPU). We describe GPU parallel processing method for dose simulation based on NRCC DOSXYZnrc. </ShortDescription>
        <URL>http://www.springerlink.com/content/r42wtk514k03865j/?p=da8f68ea438f401396ffad66aea4a402&amp;pi=77</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/765_prediction_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/765_prediction_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Tokyo Metropolitan University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>01/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Atsushi Myojyoyama</Author>
           <Author email="">Hidetoshi Saitoh</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/r42wtk514k03865j/?p=da8f68ea438f401396ffad66aea4a402&amp;pi=77">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Atsushi Myojyoyama,Hidetoshi Saitoh</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b4c6e882-4f8e-4f2b-87aa-c0667c088ae7</GUID>
        <Name>GpuCV: A GPU-Accelerated Framework for Image Processing and Computer Vision</Name>
        <ShortDescription>This paper presents briefly the state of the art of accelerating image processing with graphics hardware (GPU) and discusses some of its caveats. Then it describes GpuCV, an open source multi-platform library for GPU-accelerated image processing and Computer Vision operators and applications. It is meant for computer vision scientist not familiar with GPU technologies. GpuCV is designed to be compatible with the popular OpenCV library by offering GPU-accelerated operators that can be integrated into native OpenCV applications. The GpuCV framework transparently manages hardware capabilities, data synchronization, activation of low level GLSL and CUDA programs, on-the-fly benchmarking and switching to the most efficient implementation and finally offers a set of image processing operators with GPU acceleration available.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/764_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/764_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>TELECOM &amp; Management SudParis</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>12/03/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yannick Allusse</Author>
           <Author email="">Patrick Horain</Author>
           <Author email="">Ankit Agarwal</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/n710r02qm8k74458/?p=da8f68ea438f401396ffad66aea4a402&amp;pi=74">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yannick Allusse,Patrick Horain,Ankit Agarwal</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4786c8f5-1af2-4f0b-b323-7dee0cdd4936</GUID>
        <Name>Population Parallel GP on the G80 GPU</Name>
        <ShortDescription>The availability of low cost powerful parallel graphics cards has stimulated a trend to port GP on Graphics Processing Units (GPUs). Previous works on GPUs have shown evaluation phase speedups for large training cases sets. Using the CUDA language on the G80 GPU, we show it is possible to efficiently interpret several GP programs in parallel, thus obtaining speedups also for small training sets starting at less than 100 training cases. Our scheme was embedded in the well-known ECJ library, providing an easy entry point for owners of G80 GPUs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/762_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/762_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universite du Littoral Cote dOpale</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>04/03/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="robillia@lil.univ-littoral.fr">Denis Robilliard</Author>
           <Author email="poty@lil.univ-littoral.fr">Virginie MarionPoty</Author>
           <Author email="fonlupt@lil.univ-littoral.fr">Cyril Fonlupt</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/01807t40r5222627/?p=da8f68ea438f401396ffad66aea4a402&amp;pi=73">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Denis Robilliard,Virginie MarionPoty,Cyril Fonlupt,robillia@lil.univ-littoral.fr,poty@lil.univ-littoral.fr,fonlupt@lil.univ-littoral.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>2d5f5437-9dcc-4368-ad9a-937cce37e34c</GUID>
        <Name>Medical feature matching and model extraction from MRI/CT based on the Invariant Generalized Hough/Radon Transform</Name>
        <ShortDescription>In this paper we present a variation of the Generalized Hough Transform (GHT) for automatic feature matching and model extraction. We propose a two-dimensional algorithm with two reference points parameterization (Dual-Point GHT) that is invariant to rotation and uniform scaling and uses the specificities of the both generalized Hough and Radon transforms. The method operates with two-dimensional accumulators, that decreases strongly the required memory size. We realize the algorithm on Graphics Processing Units (GeForce 8800GTX/nVidia CUDA) and apply it to the MRI/CT cardiac shapes extraction as an initial step for further medical image segmentation.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/761_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/761_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Heidelberg</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>04</ReleaseDay>
        <ReleaseDateDisplay>02/04/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">D. Hlindzich</Author>
           <Author email="">R. Maenner</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/w37wv473748113p7/?p=da8f68ea438f401396ffad66aea4a402&amp;pi=72">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>D. Hlindzich,R. Maenner</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4a1c7daa-bbdd-4a89-b774-fafdc8d40477</GUID>
        <Name>Performance Evaluation of the NVIDIA GeForce 8800 GTX GPU for Machine Learning</Name>
        <ShortDescription>NVIDIA have released a new platform (CUDA) for general purpose computing on their graphical processing units (GPU). This paper evaluates use of this platform for statistical machine learning applications. The transfer rates to and from the GPU are measured, as is the performance of matrix vector operations on the GPU. An implementation of a sparse matrix vector product on the GPU is outlined and evaluated. Performance comparisons are made with the host processor.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/760_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/760_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Australian National University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>06/25/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="Ahmed.ElZein@anu.edu.au">Ahmed El Zein</Author>
           <Author email="Eric.McCreath@anu.edu.au">Eric McCreath</Author>
           <Author email="Alistair.Rendell@anu.edu.au">Alistair Rendell</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/d4583830557k837u/?p=da8f68ea438f401396ffad66aea4a402&amp;pi=71">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ahmed El Zein,Eric McCreath,Alistair Rendell,Ahmed.ElZein@anu.edu.au,Eric.McCreath@anu.edu.au,Alistair.Rendell@anu.edu.au</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fcdd7311-d7ab-423b-9790-8fc720230f72</GUID>
        <Name>High-Quality Rendering of Varying Isosurfaces with Cubic Trivariate C1-Continuous Splines</Name>
        <ShortDescription>Smooth trivariate splines on uniform tetrahedral partitions are well suited for high-quality visualization of isosurfaces from scalar volumetric data. We propose a novel rendering approach based on spline patches with low total degree, for which ray-isosurface intersections are computed using efficient root finding algorithms. Smoothly varying surface normals are directly extracted from the underlying spline representation. Our approach is using a combined CUDA and graphics pipeline and yields two key advantages over previous work. First, we can interactively vary the isovalues since all required processing steps are performed on the GPU. Second, we employ instancing in order to reduce shader complexity and to minimize overall memory usage. In particular, this allows to compute the spline coefficients on-the-fly in real-time on the GPU.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/759_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/759_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>TU Darmstadt</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>11/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Thomas Kalbe</Author>
           <Author email="">Thomas Koch</Author>
           <Author email="">Michael Goesele</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/kj67k61v67898423/?p=e6efad6c51a246a5a01428810aa2b808&amp;pi=68">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Thomas Kalbe,Thomas Koch,Michael Goesele</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>10eb0dc2-f2cc-4beb-9bfb-d6483731f3a4</GUID>
        <Name>Evaluation of Parallel FFT Implementations on GPU and Multi-core PCs for Magnetic Induction Tomography</Name>
        <ShortDescription>Magnetic Induction Tomography is a relatively new non-invasive modality for the imaging of the electrical properties of materials which is currently under investigation for a variety of industrial and biomedical applications, in particular the detection and monitoring of cerebral haemorrhage. The speed of FFT-based phase measurement algorithms employed in some current MIT systems is however a major limit to higher data acquisition rate and precision.</ShortDescription>
        <URL>http://www.springerlink.com/content/t5022335826j4052/?p=e6efad6c51a246a5a01428810aa2b808&amp;pi=67</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/758_prediction_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/758_prediction_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Philips Research / University of Glamorgan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>01/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>2</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Y. Maimaitijiang</Author>
           <Author email="">H. C. Wee</Author>
           <Author email="">A. Roula</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/t5022335826j4052/?p=e6efad6c51a246a5a01428810aa2b808&amp;pi=67">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Y. Maimaitijiang,H. C. Wee,A. Roula</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>72ad49af-dbf9-4586-9849-1a116402fbcd</GUID>
        <Name>Visualization and GPU-accelerated simulation of medical</Name>
        <ShortDescription>We present a fast GPU-based method for simulation of ultrasound images from volumetric CT scans and their visualization. The method uses a ray-based model of the ultrasound to generate view-dependent ultrasonic effects such as occlusions, large-scale reflections and attenuation combined with speckle patterns derived frompre-processing the CT image using a wave-based model of ultrasound propagation in soft tissue. The main applications of the method are ultrasound training and registration of ultrasound and CT images.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/755_computermethods_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/755_computermethods_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Technische Universitat Munchen</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>12/19/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Oliver Kutter</Author>
           <Author email="">Ramtin Shams</Author>
           <Author email="">Nassir Navab</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://users.rsise.anu.edu.au/~ramtin/papers/2009/CMPB_2009.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Oliver Kutter,Ramtin Shams,Nassir Navab</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b717bbc6-50d1-4024-90db-2c891a8c7716</GUID>
        <Name>Parallel Computation of Mutual Information on the GPU with Application to Real-Time Registration of 3D Medical Images</Name>
        <ShortDescription>Due to processing constraints, automatic image-based registration of medical images has been largely used as a pre-operative tool. We propose a novel method named sort and count for ecient parallelization of mutual information (MI) computation designed for massively multiprocessing architectures. Combined with a parallel transformation implementation and an improved optimization algorithm, our method achieves real-time (less than 1 second) rigid registration of 3D medical images using a commodity graphics processing unit (GPU). This represents a more than 50-fold improvement over a standard implementation on a CPU. Real-time registration opens new possibilities for development of improved and interactive intraoperative tools that can be used for enhanced visualization and navigation during an intervention.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/754_graph_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/754_graph_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Australian National University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>08/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>50</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Ramtin Shams</Author>
           <Author email="">Parastoo Sadeghi</Author>
           <Author email="">Rodney Kennedy</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://users.rsise.anu.edu.au/~ramtin/papers/2010/CMPB_2010.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ramtin Shams,Parastoo Sadeghi,Rodney Kennedy</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f6835ea9-7c74-4c76-9fe3-64880944cc7e</GUID>
        <Name>A SURVEY OF MEDICAL IMAGE REGISTRATION ON MULTI-CORE AND THE GPU</Name>
        <ShortDescription>A surgeon is performing a potentially life-saving pancreatectomy on a patient in early stages of pancreatic cancer. Two small incisions of no more than half an inch allow laparoscopic tools including a video camera and an ultrasound probe to be guided inside the abdominal cavity. A third, larger incision, is occupied by a hand-access device that facilitates the operation. The surgeon is able to locate the tumor in the ultrasound
view with ease. This is largely possible due to a newly installed 3D navigation and visualization system that virtually renders the patient transparent.</ShortDescription>
        <URL>http://users.rsise.anu.edu.au/~ramtin/papers/2010/SPM_2010.pdf</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/753_multicoregpu_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/753_multicoregpu_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Australian National University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>03/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Ramtin Shams</Author>
           <Author email="">Parastoo Sadeghi</Author>
           <Author email="">Rodney A. Kennedy</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://users.rsise.anu.edu.au/~ramtin/papers/2010/SPM_2010.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ramtin Shams,Parastoo Sadeghi,Rodney A. Kennedy</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5ded63a6-656c-44f4-a306-7cc45e85ea40</GUID>
        <Name>A GPU Tile-Load-Map architecture for terrain rendering: theory and applications</Name>
        <ShortDescription>This paper describes a robust, modular, complete GPU architecturethe Tile-Load-Map (TLM)designed for the real-time visualization of wide textured terrains created with arbitrary meshes. It extends and completes our previous succinct paper Amara et al. (ISVC 2007, Part 1, Lecture Notes in Computer Science, vol. 4841, pp. 586597, Springer, Berlin, 2007) by giving further technical and implementation details. It provides new solutions to problems that had been left unresolved, in the context of a joint use of OpenGL and CUDA, optimized on the G80 graphics chip. We explain the crucial components of the shaders, and emphasize the progress we have proposed, while resolving some difficulties. We show that this texturing architecture is well suited to current challenges, and takes into account most of the distinctive aspects of terrain rendering. Finally, we demonstrate how the design of the TLM facilitates the integration of geomatic input-data into procedural selection/rendering tasks on the GPU, and immediate applications to amplification.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/751_visualcomputer_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/751_visualcomputer_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Bab Ezzouar</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>01/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yacine Amara</Author>
           <Author email="">Xavier Marsault</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/npx0837618841g71/?p=e6efad6c51a246a5a01428810aa2b808&amp;pi=64">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yacine Amara,Xavier Marsault</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>af46c6f4-36e8-4672-af52-7cc2741bccb6</GUID>
        <Name>HISTOGRAM COMPUTATION WITH CUDA </Name>
        <ShortDescription>GPU's higher processing power compared to a standard CPU comes at the cost of reduced data caching and flow control logic as more transistors have to be devoted to data processing. This imposes certain limitations in terms of how an application may access memory and implement flow control. As a result, implementation of certain algorithms (even trivial ones) on the GPU may be difficult or may not be computationally justified. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/750_8800gtx-128_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/750_8800gtx-128_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Australian National University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>08/01/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">R. Shams</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://users.rsise.anu.edu.au/~ramtin/cuda.htm">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>R. Shams</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>82a5a192-ee7e-42dc-83d9-b24a79656a21</GUID>
        <Name>Parallel Lattice Boltzmann Flow Simulation on Emerging Multi-core Platforms</Name>
        <ShortDescription>A parallel Lattice Boltzmann Method (pLBM), which is based on hierarchical spatial decomposition, is designed to perform large-scale flow simulations. The algorithm uses critical section-free, dual representation in order to expose maximal concurrency and data locality. Performances of emerging multi-core platforms PlayStation3 (Cell Broadband Engine) and Compute Unified Device Architecture (CUDA)are tested using the pLBM, which is implemented with multi-thread and message-passing programming. The results show that pLBM achieves good performance improvement, 11.02 for Cell over a traditional Xeon cluster and 8.76 for CUDA graphics processing unit (GPU) over a Sempron central processing unit (CPU). The results provide some insights into application design on future many-core platforms.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/749_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/749_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Southern California</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>08/21/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Liu Peng</Author>
           <Author email="">Ken-ichi Nomura</Author>
           <Author email="">Takehiro Oyakawa</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/n13073t5g025316k/?p=e6efad6c51a246a5a01428810aa2b808&amp;pi=63">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Liu Peng,Ken-ichi Nomura,Takehiro Oyakawa</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>05930d99-8367-4f73-a605-c469a41e6fdb</GUID>
        <Name>Efficient Nonlinear FEM for Soft Tissue Modelling and Its GPU Implementation within the Open Source Framework SOFA</Name>
        <ShortDescription>Accurate biomechanical modelling of soft tissue is a key aspect for achieving realistic surgical simulations. However, because medical simulation is a multi-disciplinary area, researchers do not always have sufficient resources to develop an efficient and physically rigorous model for organ deformation. We address this issue by implementing a CUDA-based nonlinear finite element model into the SOFA open source framework. The proposed model is an anisotropic visco-hyperelastic constitutive formulation implemented on a graphical processor unit (GPU). After presenting results on the models performance we illustrate the benefits of its integration within the SOFA framework on a simulation of cataract surgery.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/748_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/748_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>The Australian e-Health Research Centre</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>04/07/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Olivier Comas</Author>
           <Author email="">Zeike A. Taylo</Author>
           <Author email="">Jeremie Allard</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/6x812w370666w520/?p=e6efad6c51a246a5a01428810aa2b808&amp;pi=62">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Olivier Comas,Zeike A. Taylo,Jeremie Allard</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>bd858d80-d8c8-4000-9df8-2353690a6f98</GUID>
        <Name>Four styles of parallel and net programming</Name>
        <ShortDescription>This paper reviews the programming landscape for parallel and network computing systems, focusing on four styles of concurrent programming models, and example languages/libraries. The four styles correspond to four scales of the targeted systems. At the smallest coprocessor scale, Single Instruction Multiple Thread (SIMT) and Compute Unified Device Architecture (CUDA) are considered. Transactional memory is discussed at the multicore or process scale. The MapReduce style is examined at the datacenter scale. At the Internet scale, Grid Service Markup Language (GSML) is reviewed, which intends to integrate resources distributed across multiple datacenters.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/747_computerscience_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/747_computerscience_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Chinese Academy of Sciences</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="zxu@ict.ac.cn">Zhiwei Xu</Author>
           <Author email="heyongqiang@software.ict.ac.cn">Yongqiang He</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/y616316495r0437w/?p=dda5a3d5bbe64d05ab586356673193d5&amp;pi=59">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Zhiwei Xu,Yongqiang He,zxu@ict.ac.cn,heyongqiang@software.ict.ac.cn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>955ab92b-e28a-4199-bb03-45c14dade318</GUID>
        <Name>Accelerating Image Retrieval Using Factorial Correspondence Analysis on GPU</Name>
        <ShortDescription>We are interested in the intensive use of Factorial Correspondence Analysis (FCA) for large-scale content-based image retrieval. Factorial Correspondence Analysis, is a useful method for analyzing textual data, and we adapt it to images using the SIFT local descriptors. FCA is used to reduce dimensions and to limit the number of images to be considered during the search. Graphics Processing Units (GPU) are fast emerging as inexpensive parallel processors due to their high computation power and low price. The G80 family of Nvidia GPUs provides the CUDA programming model that treats the GPU as a SIMD processor array. We present two very fast algorithms on GPU for image retrieval using FCA: the first one is a parallel incremental algorithm for FCA and the second one is an extension of the filtering algorithm in our previous work for filtering step.
Our implementation is able to scale up the FCA computation a factor of 30 compared to the CPU version. For retrieval tasks, the parallel version on GPU performs 10 times faster than the one on CPU. Retrieving images in a database of 1 million images is done in about 8 milliseconds.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/746_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/746_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Campus de Beaulieu</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>29</ReleaseDay>
        <ReleaseDateDisplay>08/29/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="Nguyen_Khang@irisa.fr">NguyenKhang Pham</Author>
           <Author email="Annie.Morin@irisa.fr">Annie Morin</Author>
           <Author email="Patrick.Gros@inria.fr">Patrick Gros</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/h7602v340w6v12k3/?p=dda5a3d5bbe64d05ab586356673193d5&amp;pi=56">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>NguyenKhang Pham,Annie Morin,Patrick Gros,Nguyen_Khang@irisa.fr,Annie.Morin@irisa.fr,Patrick.Gros@inria.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fdc39d5f-f0d5-4664-809c-9f9c10a35c34</GUID>
        <Name>Experiences with Mapping Non-linear Memory Access Patterns into GPUs</Name>
        <ShortDescription>Modern Graphics Processing Units (GPU) are very powerful computational systems on a chip. For this reason there is a growing interest in using these units as general purpose hardware accelerators (GPGPU). To facilitate the programming of general purpose applications, NVIDIA introduced the CUDA programming environment. CUDA provides a simplified abstraction of the underlying complex GPU architecture, so as a number of critical optimizations must be applied to the code in order to get maximum performance. In this paper we discuss our experience in porting an application kernel to the GPU, and all classes of design decisions we adopted in order to obtain maximum performance.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/745_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/745_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Malaga</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="eladio@uma.es">Eladio Gutierrez</Author>
           <Author email="sromero@uma.es">Sergio Romero</Author>
           <Author email="maria@uma.es">Maria A. Trenas</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/k167147131j83877/?p=dda5a3d5bbe64d05ab586356673193d5&amp;pi=51">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Eladio Gutierrez,Sergio Romero,Maria A. Trenas,eladio@uma.es,sromero@uma.es,maria@uma.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>48d86485-4f4a-4edf-aa9c-9b0900fbf425</GUID>
        <Name>Mean Shift Parallel Tracking on GPU</Name>
        <ShortDescription>We propose a parallel Mean Shift (MS) tracking algorithm on Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA). Traditional MS algorithm uses a large number of color histogram, say typically 16x16x16, which makes parallel implementation infeasible. We thus employ K-Means clustering to partition the object color space that enables us to represent color distribution with a quite small number of bins. Based on this compact histogram, all key components of the MS algorithm are mapped onto the GPU. The resultant parallel algorithm consist of six kernel functions, which involves primarily the parallel computation of the candidate histogram and calculation of the Mean Shift vector. Experiments on public available CAVIAR videos show that the proposed parallel tracking algorithm achieves large speedup and has comparable tracking performance, compared with the traditional serial MS tracking algorithm.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/744_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/744_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Heilongjiang Univesity</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>09</ReleaseDay>
        <ReleaseDateDisplay>06/09/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="peihualj@hotmail.com">Peihua Li</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/vng0328r61n2r276/?p=5de3c558696843cd89914945e73c84ca&amp;pi=45">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Peihua Li,peihualj@hotmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e606b34c-1f0d-4118-a68a-d0d7286fbab5</GUID>
        <Name>Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU</Name>
        <ShortDescription>A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs; CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS). However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed row storage strategies. By combining recent GPU programming techniques with supercomputing strategies (namely block compressed row storage and register blocking), we implement a sparse general-purpose linear solver which outperforms leading-edge CPU counterparts (MKL / ACML).</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/743_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/743_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nancy Universite</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>09/08/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="buatois@gocad.org">Luc Buatois</Author>
           <Author email="caumon@gocad.org">Guillaume Caumon</Author>
           <Author email="levy@loria.fr">Bruno Levy</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/f1582463x62v5qw4/?p=5de3c558696843cd89914945e73c84ca&amp;pi=44">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Luc Buatois,Guillaume Caumon,Bruno Levy,buatois@gocad.org,caumon@gocad.org,levy@loria.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3c1de2f7-4132-4e99-87fa-8242c3b9d107</GUID>
        <Name>Solving Sparse Linear Systems on NVIDIA Tesla GPUs</Name>
        <ShortDescription>Current many-core GPUs have enormous processing power, and unlocking this power for general-purpose computing is very attractive due to their low cost and efficient power utilization. However, the fine-grained parallelism and the stream-programming model supported by these GPUs require a paradigm shift, especially for algorithm designers. In this paper we present the design of a GPU-based sparse linear solver using the Generalized Minimum RESidual (GMRES) algorithm in the CUDA programming environment. Our implementation achieved a speedup of over 20x on the Tesla T10P based GTX280 GPU card for benchmarks with from a few thousands to a few millions unknowns.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/742_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/742_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>State University of New Jersey</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>20</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Mingliang Wang</Author>
           <Author email="">Hector Klie</Author>
           <Author email="">Manish Parashar</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/118t5q26u318025n/?p=06ad104ef0f24c98bf3a9f0e16f65aee&amp;pi=61">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mingliang Wang,Hector Klie,Manish Parashar</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f97eefd3-0c69-48b8-b703-c569c5afab1e</GUID>
        <Name>Optimizing Monte Carlo radiosity on graphics hardware</Name>
        <ShortDescription>The radiosity method is usually employed for the rendering of highly realistic synthetic images. In this paper we present an implementation of the Monte Carlo radiosity algorithm on the GPU using CUDA. Our proposal is based on the partition of the scene into sub-scenes to be processed in parallel to exploit the graphics card structure. The convex partition method employed permits the exploitation of data locality and the optimization of the ray shooting procedure due to the minimization of the number of objects to be tested in the intersection calculation. The results are good in terms of execution times, increasing the flexibility of previous solutions and demonstrating that the GPU can outperform the CPU results even for non-regular algorithms.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/741_neville_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/741_neville_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Univ. of A Coruna</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>06</ReleaseDay>
        <ReleaseDateDisplay>11/06/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="josesan@udc.es">J. R. Sanjurjo</Author>
           <Author email="margamor@udc.es">M. Amor</Author>
           <Author email="montserrat.boo@usc.es">M. Boo</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/yv8477l68n832l08/?p=06ad104ef0f24c98bf3a9f0e16f65aee&amp;pi=60">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>J. R. Sanjurjo,M. Amor,M. Boo,josesan@udc.es,margamor@udc.es,montserrat.boo@usc.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>84a7d907-10ce-4d67-ae51-cc83bf5e33ab</GUID>
        <Name>Optimizations and Performance of a Robotics Grasping Algorithm Described in Geometric Algebra</Name>
        <ShortDescription>The usage of Conformal Geometric Algebra leads to algorithms that can be formulated in a very clear and easy to grasp way. But it can also increase the performance of an implementation because of its capabilities to be computed in parallel. In this paper we show how a grasping algorithm for a robotic arm is accelerated using a Conformal Geometric Algebra formulation. The optimized C code is produced by the CGA framework Gaalop automatically. We compare this implementation with a CUDA implementation and an implementation that uses standard vector algebra.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/740_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/740_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Technische Universitat Darmstad</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>11/16/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Florian Worsdorfer</Author>
           <Author email="">Florian Stock</Author>
           <Author email="">Eduardo BayroCorrochano</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/3514k47841k67614/?p=8a2057a2e83b499ea703c7344b4b48d0&amp;pi=59">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Florian Worsdorfer,Florian Stock,Eduardo BayroCorrochano</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>dbeef696-2ea9-4394-bd10-b2f4aea55e81</GUID>
        <Name>Efficient Mapping of Multiresolution Image Filtering Algorithms on Graphics Processors</Name>
        <ShortDescription>In the last decade, there has been a dramatic growth in research and development of massively parallel commodity graphics hardware both in academia and industry. Graphics card architectures provide an optimal platform for parallel execution of many number crunching loop programs from fields like image processing, linear algebra, etc. However, it is hard to efficiently map such algorithms to the graphics hardware even with detailed insight into the architecture. This paper presents a multiresolution image processing algorithm and shows the efficient mapping of this type of algorithms to the graphics hardware. Furthermore, the impact of execution configuration is illustrated and a method is proposed to determine the best configuration offline in order to use it at run-time. Using CUDA as programming model, it is demonstrated that the image processing algorithm is significantly accelerated and that a speedup of up to 33x can be achieved on NVIDIA's Tesla C870 compared to a parallelized implementation on a Xeon Quad Core.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/739_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/739_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Erlangen-Nuremberg</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>07/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>33</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="richard.membarth@cs.fau.de">Richard Membarth</Author>
           <Author email="hannig@cs.fau.de">Frank Hannig</Author>
           <Author email="dutta@cs.fau.de">Hritam Dutta</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/f09084w13j4pm73m/?p=8a2057a2e83b499ea703c7344b4b48d0&amp;pi=58">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Richard Membarth,Frank Hannig,Hritam Dutta,richard.membarth@cs.fau.de,hannig@cs.fau.de,dutta@cs.fau.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>2ca5d0a9-8e2c-4c94-8c92-ccea0f8f3ede</GUID>
        <Name>Non-rigid Registration for Large Sets of Microscopic Images on Graphics Processors</Name>
        <ShortDescription>Microscopic imaging is an important tool for characterizing tissue morphology and pathology. 3D reconstruction and visualization of large sample tissue structure requires registration of large sets of high-resolution images. However, the scale of this problem presents a challenge for automatic registration methods. In this paper we present a novel method for efficient automatic registration using graphics processing units (GPUs) and parallel programming. Comparing a C++ CPU implementation with Compute Unified Device Architecture (CUDA) libraries and pthreads running on GPU we achieve a speed-up factor of up to 4.11 with a single GPU and 6.68x with a GPU pair. We present execution times for a benchmark composed of two sets of large-scale images: mouse placenta (16K x16K pixels) and breast cancer tumors (23K x62K pixels). It takes more than 12 hours for the genetic case in C++ to register a typical sample composed of 500 consecutive slides, which was reduced to less than 2 hours using two GPUs, in addition to a very promising scalability for extending those gains easily on a large number of GPUs in a distributed system.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/738_hyperspectral_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/738_hyperspectral_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Malaga</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>7</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="aruiz@ac.uma.es">Antonio Ruiz</Author>
           <Author email="ujaldon@ac.uma.es">Manuel Ujaldon</Author>
           <Author email="cooperl@ece.osu.edu">Lee Cooper</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/354824320p62tl1l/?p=8a2057a2e83b499ea703c7344b4b48d0&amp;pi=56">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Antonio Ruiz,Manuel Ujaldon,Lee Cooper,aruiz@ac.uma.es,ujaldon@ac.uma.es,cooperl@ece.osu.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ddb390fb-53bb-46e9-aad0-d5443baf25a4</GUID>
        <Name>Integrated Digital Image Correlation for the Identification of Mechanical Properties</Name>
        <ShortDescription>Digital Image Correlation (DIC) is a powerful technique to provide full-field displacement measurements for mechanical tests of materials and structures. The displacement fields may be further processed as an entry for identification procedures giving access to parameters of constitutive laws. A new implementation of a Finite Element based Integrated Digital Image Correlation (I-DIC) method is presented, where the two stages (image correlation and mechanical identification) are coupled. This coupling allows one to minimize information losses, even in case of low signal-to-noise ratios. A case study for elastic properties of a composite material illustrates the approach, and highlights the accuracy of the results. Implementations on GPUs (using CUDA) leads to high speed performance while preserving the versatility of the methodology.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/737_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/737_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>SpringerLink</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>05/05/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="hugo.leclerc@lmt.ens-cachan.fr">Hugo Leclerc</Author>
           <Author email="jean-noel.perie@lmt.ens-cachan.fr">Jean-Noel Perie</Author>
           <Author email="stephane.roux@lmt.ens-cachan.fr">Stephane Roux</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/10xg75213727r66x/?p=8a2057a2e83b499ea703c7344b4b48d0&amp;pi=54">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Hugo Leclerc,Jean-Noel Perie,Stephane Roux,hugo.leclerc@lmt.ens-cachan.fr,jean-noel.perie@lmt.ens-cachan.fr,stephane.roux@lmt.ens-cachan.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8b5c1771-af09-43fd-a7ba-8160936587d3</GUID>
        <Name>Multifold Acceleration of Neural Network Computations Using GPU</Name>
        <ShortDescription>With emergence of graphics processing units (GPU) of the latest generation, it became possible to undertake neural network based computations using GPU on serially produced video display adapters. In this study, NVIDIA CUDA technology has been used to implement standard back-propagation algorithm for training multiple perceptrons simultaneously on GPU. For the problem considered, GPU-based implementation (on NVIDIA GTX 260 GPU) has lead to a 50x speed increase compared to a highly optimized CPU-based computer program, and more than 150x compared to a commercially available CPU-based software (NeuroShell 2) (AMD Athlon 64 Dual core 6000+ processor).</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/736_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/736_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Lomonosov Moscow State University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>09/16/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>50</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="nop43@rambler.ru">Alexander Guzhva</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/d1272127r236540h/?p=8a2057a2e83b499ea703c7344b4b48d0&amp;pi=53">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alexander Guzhva,nop43@rambler.ru</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>dddf07ad-463e-4556-8709-33b2f8f5b204</GUID>
        <Name>Genetic programming on graphics processing units</Name>
        <ShortDescription>The availability of low cost powerful parallel graphics cards has stimulated the port of Genetic Programming (GP) on Graphics Processing Units (GPUs). Our work focuses on the possibilities offered by Nvidia G80 GPUs when programmed in the CUDA language. In a first work we have showed that this setup allows to develop fine grain parallelization schemes to evaluate several GP programs in parallel, while obtaining speedups for usual training sets and program sizes. Here we present another parallelization scheme and optimizations about program representation and use of GPU fast memory. This increases the computation speed about three times faster, up to 4 billion GP operations per second. The code has been developed within the well known ECJ library and is open source.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/735_hybrid_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/735_hybrid_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>SpringerLink</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>13</ReleaseDay>
        <ReleaseDateDisplay>10/13/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="robillia@lil.univ-littoral.fr">Denis Robilliard</Author>
           <Author email="poty@lil.univ-littoral.fr">Virginie Marion-Poty</Author>
           <Author email="onlupt@lil.univ-littoral.fr">Cyril Fonlupt</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/t7368k577p8n437l/?p=8a2057a2e83b499ea703c7344b4b48d0&amp;pi=52">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Denis Robilliard,Virginie Marion-Poty,Cyril Fonlupt,robillia@lil.univ-littoral.fr,poty@lil.univ-littoral.fr,onlupt@lil.univ-littoral.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a20016ac-ca3b-4eb6-b760-3c62fa956a30</GUID>
        <Name>A Particle-Mesh Integrator for Galactic Dynamics Powered by GPGPUs</Name>
        <ShortDescription>We present a particle-mesh N-body integrator running on GPU using CUDA. Relying on a grid-based description of the gravitational potential, it can simulate the evolution of self-interacting 'stars' in order to model e.g. galaxies. All the steps of the application have been ported on the GPU, namely 1/ an histogramming algorithm with CUDPP, 2/ of the resolution of the Poisson equation by means of FFT with CUFFT and multi-grid relaxation, 3/ of an optimized finite difference scheme to compute the accelerations of stars and 4/ of an update procedure for positions and velocities. We present several tests at different resolution, and reach a speedup from 2 to 50 depending on the resolution and on the test case.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/734_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/734_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> Universite de Strasbourg</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>50</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Dominique Aubert</Author>
           <Author email="">Mehdi Amini</Author>
           <Author email="">Romaric David</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/g502310g830x1412/?p=8a2057a2e83b499ea703c7344b4b48d0&amp;pi=51">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Dominique Aubert,Mehdi Amini,Romaric David</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>bee72b5d-162e-4691-a0b7-f6646c239fbf</GUID>
        <Name>Parallel Implementations of Recurrent Neural Network Learning</Name>
        <ShortDescription>Neural networks have proved to be effective in solving a wide range of problems. As problems become more and more demanding, they require larger neural networks, and the time used for learning is consequently greater. Parallel implementations of learning algorithms are therefore vital for a useful application. Implementation, however, strongly depends on the features of the learning algorithm and the underlying hardware architecture. For this experimental work a dynamic problem was chosen which implicates the use of recurrent neural networks and a learning algorithm based on the paradigm of learning automata. Two parallel implementations of the algorithm were applied - one on a computing cluster using MPI and OpenMP libraries and one on a graphics processing unit using the CUDA library. The performance of both parallel implementations justifies the development of parallel algorithms.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/733_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/733_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Ljubljana</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>09/30/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="uros.lotric@fri.uni-lj.si"></Author>
           <Author email="andrej.dobnikar@fri.uni-lj.si">Uros Lotric</Author>
           <Author email="">Andrej Dobnikar</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/3116717v4652k61h/?p=8a2057a2e83b499ea703c7344b4b48d0&amp;pi=50">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>,Uros Lotric,Andrej Dobnikar,uros.lotric@fri.uni-lj.si,andrej.dobnikar@fri.uni-lj.si</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>91c16088-9260-4f2d-aaea-bdfde57fb25d</GUID>
        <Name>Orders-of-magnitude performance increases in GPU-accelerated correlation of images from the International Space Station</Name>
        <ShortDescription>We implement image correlation, a fundamental component of many real-time imaging and tracking systems, on a graphics processing unit (GPU) using NVIDIA's CUDA platform. We use our code to analyze images of liquid-gas phase separation in a model colloid-polymer system, photographed in the absence of gravity aboard the International Space Station (ISS). Our GPU code is 4,000 times faster than simple MATLAB code performing the same calculation on a central processing unit (CPU), 130 times faster than simple C code, and 30 times faster than optimized C++ code using single-instruction, multiple-data (SIMD) extensions. The speed increases from these parallel algorithms enable us to analyze images downlinked from the ISS in a rapid fashion and send feedback to astronauts on orbit while the experiments are still being run.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/732_iss_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/732_iss_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Harvard University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>10/30/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>130</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Peter J. Lu</Author>
           <Author email="">Hidekazu Oki</Author>
           <Author email="">Catherine A. Frey</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/u1704254764133t5/?p=c5eead9af73340e58a313d95581cfd40&amp;pi=49">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Peter J. Lu,Hidekazu Oki,Catherine A. Frey</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b3c55a10-7397-4e94-a954-a949e0bc26cd</GUID>
        <Name>A mathematical speedup prediction model for parallel vs. sequential programs</Name>
        <ShortDescription>Data independent command sequences are part of many algorithms. One way to speed up their execution is processing on a single instruction multiple data (SIMD) architecture. But an implementation must not necessarily be efficient. To predict program acceleration for NVIDIA's compute unified device architecture (CUDA), a parallel computing platform based on graphics boards, a mathematical model is developed. This model extends the common approach for so called speedup prediction by CUDA hardware and algorithm specific parameters. The identification of some model parameters is difficult since they depend on hardware internal parameters. The model is tested for a convolution filter and yields conservative processing time predictions.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/731_prediction_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/731_prediction_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Applied Sciences Gelsenkirchen</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>02</ReleaseMonth>
        <ReleaseDay>04</ReleaseDay>
        <ReleaseDateDisplay>02/04/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="heinrich-martin.overhoff@fh-gelsenkirchen.de">Heinrich Martin Overhoff</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/wt552nu544040717/?p=c5eead9af73340e58a313d95581cfd40&amp;pi=48">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Heinrich Martin Overhoff,heinrich-martin.overhoff@fh-gelsenkirchen.de</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>8a45da2c-3f81-429c-8576-c6be8690765f</GUID>
        <Name>Improving the Performance of Hyperspectral Image and Signal Processing Algorithms Using Parallel, Distributed and Specialized Hardware-Based Systems</Name>
        <ShortDescription>Advances in sensor technology are revolutionizing the way remotely sensed data is collected, managed and analyzed. The incorporation of latest generation sensors to airborne and satellite platforms is currently producing a nearly continual stream of high dimensional data, and this explosion in the amount of collected information has rapidly created new processing challenges. </ShortDescription>
       <URL>http://www.springerlink.com/content/hp81u02p11126226/?p=c5eead9af73340e58a313d95581cfd40&amp;pi=47</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/729_hyperspectral_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/729_hyperspectral_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Extremadura</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>01/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="aplaza@unex.es">Antonio Plaza</Author>
           <Author email="jplaza@unex.es">Javier Plaz</Author>
           <Author email="hugovegas@fdi.ucm.es">Hugo Vegas</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/hp81u02p11126226/?p=c5eead9af73340e58a313d95581cfd40&amp;pi=47">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Antonio Plaza,Javier Plaz,Hugo Vegas,aplaza@unex.es,jplaza@unex.es,hugovegas@fdi.ucm.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>66a4c5f1-213c-4450-b8d1-ce3745396713</GUID>
        <Name>GCSim: A GPU-Based Trace-Driven Simulator for Multi-level Cache</Name>
        <ShortDescription>We describe the design of parallel trace-driven cache simulation for the purposes of evaluating different cache structures. As the research goes deeper, traditional simulation methods, which can only execute simulation operations in sequence, are no longer practical due to their long simulation cycles. An obvious way to achieve fast parallel simulation is to simulate the independent sets of a cache concurrently on different compute resources. We considered the use of generic GPU to accelerate cache simulation which exploits set-partitioning as the main source of parallelism. But we show this technique is not efficient in the case that just simulating one cache configuration, since a high correlation of the activity between different sets. Trace-sort and multi-configuration simulation in one single pass techniques are developed, taking advantage of the full programmability offered by the Compute Unified Device Architecture (CUDA) on the GPU. Our experimental results demonstrate that the cache simulator based on GPU-CPU platform gains 2.44x performance improvement compared to traditional sequential algorithm.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/728_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/728_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName> Beihang University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>08/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>3</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="wanhan@les.buaa.edu.cn">Han Wan</Author>
           <Author email="gxp@les.buaa.edu.cn">Xiaopeng Gao</Author>
           <Author email="long@les.buaa.edu.cn">Xiang Long</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/t156p03l56p1t537/?p=c5eead9af73340e58a313d95581cfd40&amp;pi=46">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Han Wan,Xiaopeng Gao,Xiang Long,wanhan@les.buaa.edu.cn,gxp@les.buaa.edu.cn,long@les.buaa.edu.cn</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f8d98fea-3e48-4399-97c6-1cc70bf36e27</GUID>
        <Name>GPU Accelerated RNA Folding Algorithm</Name>
        <ShortDescription>Many bioinformatics studies require the analysis of RNA or DNA structures. More specifically, extensive work is done to elaborate efficient algorithms able to predict the 2-D folding structures of RNA or DNA sequences. However, the high computational complexity of the algorithms, combined with the rapid increase of genomic data, triggers the need of faster methods. Current approaches focus on parallelizing these algorithms on multiprocessor systems or on clusters, yielding to good performance but at a relatively high cost. Here, we explore the use of computer graphics hardware to speed up these algorithms which, theoretically, provide both high performance and low cost. We use the CUDA programming language to harness the power of NVIDIA graphic cards for general computation with a C-like environment. Performances on recent graphic cards achieve a x17 speed-up.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/727_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/727_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>universitaire de Beaulieu</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>17</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="guillaume.rizk@irisa.fr">Guillaume Rizk</Author>
           <Author email="dominique.lavenier@irisa.fr">Dominique Lavenier</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/xn31pvr5031q3473/?p=c5eead9af73340e58a313d95581cfd40&amp;pi=45">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Guillaume Rizk,Dominique Lavenier,guillaume.rizk@irisa.fr,dominique.lavenier@irisa.fr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b122d8ea-f81f-4796-a3b4-2f519b9b05f2</GUID>
        <Name>Multimedia Mining on Manycore Architectures: The Case for GPUs</Name>
        <ShortDescription>Media mining, the extraction of meaningful knowledge from multimedia content, poses significant computational challenges in today's platforms, particularly in real-time scenarios. In this paper, we show how Graphic Processing Units (GPUs) can be leveraged for compute-intensive media mining applications. Furthermore, we propose a parallel implementation of color visual descriptors (color correlograms and color histograms) commonly used in multimedia content analysis on a CUDA (Compute Unified Device Architecture) enabled GPU (the Nvidia GeForce GTX280 GPU). Through the use of shared memory as software managed cache and efficient data partitioning, we reach computation throughputs of over 1.2 Giga Pixels/sec for HSV color histograms and over 100 Mega Pixels/sec for HSV color correlograms. We show that we can achieve better than real time performance and major speedups compared to high-end multicore CPUs and comparable performance on known implementations on the Cell B.E. We also study different trade-offs on the size and complexity of the features and their effect on performance.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/726_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/726_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Georgia Institute of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>11/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="mamadou@ece.gatech.edu">Mamadou Diao</Author>
           <Author email="jkim@ece.gatech.edu">Jongman Kim</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/545706369641414r/?p=c5eead9af73340e58a313d95581cfd40&amp;pi=44">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mamadou Diao,Jongman Kim,mamadou@ece.gatech.edu,jkim@ece.gatech.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>73c6f376-cc30-4f50-a53d-0246839f1870</GUID>
        <Name>MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs</Name>
        <ShortDescription>CUDA is a data parallel programming model that supports several key abstractions - thread blocks, hierarchical memory and barrier synchronization - for writing applications. This model has proven effective in programming GPUs. In this paper we describe a framework called MCUDA, which allows CUDA programs to be executed efficiently on shared memory, multi-core CPUs. Our framework consists of a set of source-level compiler transformations and a runtime system for parallel execution. Preserving program semantics, the compiler transforms threaded SPMD functions into explicit loops, performs fission to eliminate barrier synchronizations, and converts scalar references to thread-local data to replicated vector references. We describe an implementation of this framework and demonstrate performance approaching that achievable from manually parallelized and optimized C code. With these results, we argue that CUDA can be an effective data-parallel programming model for more than just GPU architectures.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/725_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/725_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Illinois at Urbana-Champaign</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>11/28/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="stratton@crhc.uiuc.edu">John A. Stratton</Author>
           <Author email="ssstone2@crhc.uiuc.edu">Sam S. Stone</Author>
           <Author email="hwu@crhc.uiuc.edu">Wen-mei W. Hwu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/x361x32j1q840072/?p=274e9be41b7549e999181fac9e6510f1&amp;pi=26">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>John A. Stratton,Sam S. Stone,Wen-mei W. Hwu,stratton@crhc.uiuc.edu,ssstone2@crhc.uiuc.edu,hwu@crhc.uiuc.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1c474439-b236-4f2e-b625-a7e540f06ffa</GUID>
        <Name>A Real-Time Video Illustration Using CUDA</Name>
        <ShortDescription>According to advancements in video technology, there are lots of needs for various special effects of videos. The conventional image-transform effects could be applied to video streams, but non-photorealistic rendering effects are not easy to apply. For example, cartoon or illustration effects have expensive costs in video transformation which makes it difficult to execute in real-time. In this paper, we suggest a video transformation system with illustration effects. It is designed to apply the illustration effects to the video stream directly and is implemented to achieve real time performances using the GPU hardware with NVIDIA's CUDA.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/724_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/724_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Electronics and Telecommunications Research Institute</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>08/28/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ijihyung@etri.re.kr">JiHyung Lee</Author>
           <Author email="ys-choi@etri.re.kr">Yoon-Seok Choi</Author>
           <Author email="bkkoo@etri.re.kr">Bon-Ki Koo</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/r67462676811vkvv/?p=274e9be41b7549e999181fac9e6510f1&amp;pi=25">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>JiHyung Lee,Yoon-Seok Choi,Bon-Ki Koo,ijihyung@etri.re.kr,ys-choi@etri.re.kr,bkkoo@etri.re.kr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f3da9a57-398b-4320-adf8-c66fc56e7440</GUID>
        <Name>A Fast and Flexible Sorting Algorithm with CUDA</Name>
        <ShortDescription>In this paper, we propose a fast and flexible sorting algorithm with CUDA. The proposed algorithm is much more practical than the previous GPU-based sorting algorithms, as it is able to handle the sorting of elements represented by integers, floats and structures. Meanwhile, our algorithm is optimized for the modern GPU architecture to obtain high performance. We use different strategies for sorting disorderly list and nearly sorted list to make it adaptive. Extensive experiments demonstrate our algorithm has higher performance than previous GPU-based sorting algorithms and can support realtime applications.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/723_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/723_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Chinese Academy of Sciences</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>07/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="sf.chen@siat.ac.cn">Shifu Chen</Author>
           <Author email="jqin@cse.cuhk.edu.hk">Jing Qin</Author>
           <Author email="ymxie@cse.cuhk.edu.hk">Yongming Xie</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/628628807m601030/?p=274e9be41b7549e999181fac9e6510f1&amp;pi=22">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Shifu Chen,Jing Qin,Yongming Xie,sf.chen@siat.ac.cn,jqin@cse.cuhk.edu.hk,ymxie@cse.cuhk.edu.hk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>448d3111-a000-421c-b2da-c00e1509d590</GUID>
        <Name>Exploring Parallel Algorithms for Volumetric Mass-Spring-Damper Models in CUDA</Name>
        <ShortDescription>Since the advent of programmable graphics processors (GPUs) their computational powers have been utilized for general purpose computation. Initially by exploiting graphics APIs and recently through dedicated parallel computation frameworks such as the Compute Unified Device Architecture (CUDA) from Nvidia. This paper investigates multiple implementations of volumetric Mass-Spring-Damper systems in CUDA. The obtained performance is compared to previous implementations utilizing the GPU through the OpenGL graphics API. We find that both performance and optimization strategies differ widely between the OpenGL and CUDA implementations. Specifically, the previous recommendation of using implicitly connected particles is replaced by a recommendation that supports unstructured meshes and run-time topological changes with an insignificant performance reduction.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/722_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/722_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Aarhus</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>07</ReleaseDay>
        <ReleaseDateDisplay>07/07/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Allan Rasmusson</Author>
           <Author email="">Jesper Mosegaard</Author>
           <Author email="">Thomas Sangild</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/f5854234457868m4/?p=274e9be41b7549e999181fac9e6510f1&amp;pi=21">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Allan Rasmusson,Jesper Mosegaard,Thomas Sangild</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8072ce4d-85ec-4bad-8d3f-986eae58cfd2</GUID>
        <Name>Implementation of Parallel Genetic Algorithm Based on CUDA</Name>
        <ShortDescription>Genetic Algorithm (GA) is a powerful tool for science computing, while Parallel Genetic Algorithm (PGA) further promotes the performance of computing. However, the traditional parallel computing environment is very difficult to set up, much less the price. This gives rise to the appearance of moving dense computing to graphics hardware, which is inexpensive and more powerful. The paper presents a hierarchical parallel genetic algorithm, implemented by NVIDIAs Compute Unified Device Architecture (CUDA). Mixed with master-slave parallelization method and multiple-demes parallelization method, this algorithm has contributed to better utilization of threads and high-speed shared memory in CUDA.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/721_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/721_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>China University of Geosciences</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>09/30/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Sifa Zhang</Author>
           <Author email="">Zhenming He</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/wh11401458kk6n18/?p=274e9be41b7549e999181fac9e6510f1&amp;pi=20">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sifa Zhang,Zhenming He</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c8b48fed-9a91-483e-bfcd-d80e41dde203</GUID>
        <Name>Memory Locality Exploitation Strategies for FFT on the CUDA Architecture</Name>
        <ShortDescription>Modern graphics processing units (GPU) are becoming more and more suitable for general purpose computing due to its growing computational power. These commodity processors follow, in general, a parallel SIMD execution model whose efficiency is subject to a right exploitation of the explicit memory hierarchy, among other factors. In this paper we analyze the implementation of the Fast Fourier Transform using the programming model of the Compute Unified Device Architecture (CUDA) recently released by NVIDIA for its new graphics platforms. Within this model we propose an FFT implementation that takes into account memory reference locality issues that are crucial in order to achieve a high execution performance. This proposal has been experimentally tested and compared with other well known approaches such as the manufacturer's FFT library.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/720_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/720_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Malaga</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>06</ReleaseDay>
        <ReleaseDateDisplay>12/06/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="eladio@ac.uma.es">Eladio Gutierrez</Author>
           <Author email="sromero@ac.uma.es">Sergio Romero</Author>
           <Author email="maria@ac.uma.es">aria A. Trenas</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/d28q81u806407565/?p=14c098d15b99440182ea9c774c8dc090&amp;pi=18">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Eladio Gutierrez,Sergio Romero,aria A. Trenas,eladio@ac.uma.es,sromero@ac.uma.es,maria@ac.uma.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b2b7e513-4ef5-4e1a-8a28-1aa414ba9965</GUID>
        <Name>Parallel Quantum Computer Simulation on the CUDA Architecture</Name>
        <ShortDescription>Due to their increasing computational power, modern graphics processing architectures are becoming more and more popular for general purpose applications with high performance demands. This is the case of quantum computer simulation, a problem with high computational requirements both in memory and processing power. When dealing with such simulations, multiprocessor architectures are an almost obliged tool. In this paper we explore the use of the new graphics processor architecture NVIDIA CUDA in the simulation of some basic quantum computing operations. This new architecture is oriented towards a more general exploitation of the graphics platform, allowing to use it as a parallel SIMD multiprocessor. In this direction, some implementation strategies are proposed, showing that the effectiveness of the codes is subject to a right exploitation of the underlying memory hierarchy.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/718_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/718_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Malaga</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>06/25/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="eladio@ac.uma.es">Eladio Gutierrez</Author>
           <Author email="sromero@ac.uma.es">Sergio Romero</Author>
           <Author email="maria@ac.uma.es">Maria A. Trenas</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/wq28572x1w181182/?p=14c098d15b99440182ea9c774c8dc090&amp;pi=17">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Eladio Gutierrez,Sergio Romero,Maria A. Trenas,eladio@ac.uma.es,sromero@ac.uma.es,maria@ac.uma.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3a301449-69ff-493b-a459-7f4ff6b973a0</GUID>
        <Name>Implementation of a LatticeBoltzmann method for numerical fluid mechanics using the NVIDIA CUDA technology</Name>
        <ShortDescription>The LatticeBoltzmann method (LBM) is a distributionfunction based approach to numerical fluid mechanics. Due to the simple formulation of the underlying algorithm this method is well suited for parallelization and hardware acceleration using general purpose graphical processing units (GPGPU). Within this work LBM has been implemented in a new code with multi-GPU support and physically validated for a flow around a sphere. The performance analysis shows a remarkable speed-up of 1840% using 3 GPU's in comparison to a single socket multi core CPU calculation. Moreover the validation for the test case chosen shows excellent agreement with available reference data.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/717_implementation_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/717_implementation_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Technische Universitat Munchen</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>06</ReleaseDay>
        <ReleaseDateDisplay>05/06/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>18</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="Thomas.Indinger@tum.de">T. Indinger</Author>
           <Author email="">E. Riegel</Author>
           <Author email="">N. A. Adams</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/417774437h7h0462/?p=14c098d15b99440182ea9c774c8dc090&amp;pi=16">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>T. Indinger,E. Riegel,N. A. Adams,Thomas.Indinger@tum.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ade79ba5-8167-479e-837c-4edc6c615cd4</GUID>
        <Name>CUDA-Lite: Reducing GPU Programming Complexity</Name>
        <ShortDescription>The computer industry has transitioned into multi-core and many-core parallel systems. The CUDA programming environment from NVIDIA is an attempt to make programming many-core GPUs more accessible to programmers. However, there are still many burdens placed upon the programmer to maximize performance when using CUDA. One such burden is dealing with the complex memory hierarchy. Efficient and correct usage of the various memories is essential, making a difference of 2-17x in performance. Currently, the task of determining the appropriate memory to use and the coding of data transfer between memories is still left to the programmer. We believe that this task can be better performed by automated tools. We present CUDA-lite, an enhancement to CUDA, as one such tool. We leverage programmer knowledge via annotations to perform transformations and show preliminary results that indicate auto-generated code can have performance comparable to hand coding.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/716_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/716_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Illinois at Urbana-Champaign</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>11/28/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>17</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ueng@crhc.uiuc.edu">Sain-Zee Ueng</Author>
           <Author email="mlathara@crhc.uiuc.edu">Melvin Lathara</Author>
           <Author email="bsadeghi@crhc.uiuc.edu">Sara S. Baghsorkhi</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/k728327774k55073/?p=14c098d15b99440182ea9c774c8dc090&amp;pi=15">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Programming Tools</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sain-Zee Ueng,Melvin Lathara,Sara S. Baghsorkhi,ueng@crhc.uiuc.edu,mlathara@crhc.uiuc.edu,bsadeghi@crhc.uiuc.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>99559840-4e04-42b3-aef7-def473ec47e9</GUID>
        <Name>Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA</Name>
        <ShortDescription>This paper describes several parallel algorithmic variations of the Neville elimination. This elimination solves a system of linear equations making zeros in a matrix column by adding to each row an adequate multiple of the preceding one. The parallel algorithms are run and compared on different multi- and many-core platforms using parallel programming techniques as MPI, OpenMP and CUDA.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/715_neville_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/715_neville_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universidad de Oviedo / Universidad Politecnica de Valencia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>18</ReleaseDay>
        <ReleaseDateDisplay>11/18/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="palonso@uniovi.es">P. Alonso</Author>
           <Author email="raquel@uniovi.es">R. Cortina</Author>
           <Author email="fjmartin@dcom.upv.es">F. J. Martinez Zaldivar</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/h49626615t707334/?p=14c098d15b99440182ea9c774c8dc090&amp;pi=14">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Neville,Multi core, Many core, OpenMP, MPI,GPU,CUDA,CUBLAS,P. Alonso,R. Cortina,F. J. Martinez Zaldivar,palonso@uniovi.es,raquel@uniovi.es,fjmartin@dcom.upv.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>662a5685-925d-4f0a-9e8e-8cf9681a85a4</GUID>
        <Name>Real-Time Ray Tracing with CUDA</Name>
        <ShortDescription>The graphics processors (GPUs) have recently emerged as a low-cost alternative for parallel programming. Since modern GPUs have great computational power as well as high memory bandwidth, running ray tracing on them has been an active field of research in computer graphics in recent years. Furthermore, the introduction of CUDA, a novel GPGPU architecture, has removed several limitations that the traditional GPU-based ray tracing suffered. In this paper, an implementation of high per formance CUDA ray tracing is demonstrated. We focus on the perfor mance and show how our design choices in various optimization lead to an implementation that outperforms the previous works. For reasonably complex scenes with simple shading, our implementation achieves the performance of 30 to 43 million traced rays per second. Our implementation also includes the effects of recursive specular reflection and refraction, which were less discussed in previous GPU-based ray tracing works.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/714_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/714_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>National Tsing Hua University / National Taiwan Normal University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>07/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="min_shih@ibr.cs.nthu.edu.tw">Min Shih</Author>
           <Author email="yfchiu@ibr.cs.nthu.edu.tw">Yung-Feng Chiu</Author>
           <Author email="louis@ibr.cs.nthu.edu.tw">Ying-Chieh Chen</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/p48xruu7357w17r2/?p=14c098d15b99440182ea9c774c8dc090&amp;pi=13">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ray Tracing - Programmable Graphics Hardware - GPU Computing - CUDA - Multithreaded Architectures,Min Shih,Yung-Feng Chiu,Ying-Chieh Chen,min_shih@ibr.cs.nthu.edu.tw,yfchiu@ibr.cs.nthu.edu.tw,louis@ibr.cs.nthu.edu.tw</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>21718119-decf-4a85-ad89-c2acf965a3a1</GUID>
        <Name>Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA</Name>
        <ShortDescription>Program development environments have enabled graphics processing units (GPUs) to become an attractive high performance computing platform for the scientific community. A commonly posed problem in computational biology is protein database searching for functional similarities. The most accurate algorithm for sequence alignments is Smith-Waterman (SW). However, due to its computational complexity and rapidly increasing database sizes, the process becomes more and more time consuming making cluster based systems more desirable. Therefore, scalable and highly parallel methods are necessary to make SW a viable solution for life science researchers. In this paper we evaluate how SW fits onto the target GPU architecture by exploring ways to map the program architecture on the processor architecture. We develop new techniques to reduce the memory footprint of the application while exploiting the memory hierarchy of the GPU. With this implementation, GSW, we overcome the on chip memory size constraint, achieving 23x speedup compared to a serial implementation. Results show that as the query length increases our speedup almost stays stable indicating the solid scalability of our approach. Additionally this is a first of a kind implementation which purely runs on the GPU instead of a CPU-GPU integrated environment, making our design suitable for porting onto a cluster of GPUs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/713_scalable_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/713_scalable_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Arizona</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>06/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>23</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="akoglu@email.arizona.edu">Ali Akoglu</Author>
           <Author email="gmstrie@email.arizona.edu">Gregory M. Striemer</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/k248w1x24746043m/?p=14c098d15b99440182ea9c774c8dc090&amp;pi=12">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ali Akoglu,Gregory M. Striemer,akoglu@email.arizona.edu,gmstrie@email.arizona.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>94546634-a42d-4be6-9db8-6e5f34612d9f</GUID>
        <Name>Hybrid of genetic algorithm and local search to solve MAX-SAT problem using nVidia CUDA framework</Name>
        <ShortDescription>General Purpose computing over Graphical Processing Units (GPGPUs) is a huge shift of paradigm in parallel computing that promises a dramatic increase in performance. But GPGPUs also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges and design choices involved in parallelizing a hybrid of Genetic Algorithm (GA) and Local Search (LS) to solve MAXimum SATisfiability (MAX-SAT) problem on a state-of-the-art nVidia Tesla GPU using nVidia Compute Unified Device Architecture (CUDA). MAX-SAT is a problem of practical importance and is often solved by employing metaheuristics based search methods like GAs and hybrid of GA with LS. Almost all the parallel GAs (pGAs) designed in the last two decades were designed for either clusters or MPPs. Unfortunately, very little research is done on the implementation of such algorithms over commodity graphics hardware. GAs in their simple form are not suitable for implementation over the Single Instruction Multiple Thread (SIMT) architecture of a GPU, and the same is the case with conventional LS algorithms. In this paper we explore different genetic operators that can be used for an efficient implementation of GAs over nVidia GPUs. We also design and introduce new techniques/operators for an efficient implementation of GAs and LS over such architectures. We use nVidia Tesla C1060 to perform several numerical tests and performance measurements and show that in the best case we obtain a speedup of 25x. We also discuss the effects of different optimization techniques on the overall execution time.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/712_hybrid_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/712_hybrid_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Hokkaido University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>10/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>25</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="asim@uva.cims.hokudai.ac.jp">Asim Munawar</Author>
           <Author email="wahibium@uva.cims.hokudai.ac.jp">Mohamed Wahib</Author>
           <Author email="munetomo@iic.hokudai.ac.jp">Masaharu Munetomo</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/m1272275232xlh06/?p=14c098d15b99440182ea9c774c8dc090&amp;pi=11">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Compute unified device architecture (CUDA) - General-purpose computing on graphics processing unit (GPGPU) - Genetic algorithm (GA) - MAXimum SATisfiability problem (MAX-SAT) - Single instruction multiple data (SIMD) - Single instruction multiple threads (SIMT),Asim Munawar,Mohamed Wahib,Masaharu Munetomo,asim@uva.cims.hokudai.ac.jp,wahibium@uva.cims.hokudai.ac.jp,munetomo@iic.hokudai.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3c39d32e-5e81-4e93-8d02-4c6e2105d2be</GUID>
        <Name>CUDA Solutions for the SSSP Problem</Name>
        <ShortDescription>We present several algorithms that solve the single-source shortest-path problem using CUDA. We have run them on a database, composed of hundreds of large graphs represented by adjacency lists and adjacency matrices, achieving high speedups regarding a CPU implementation based on Fibonacci heaps. Concerning correctness, we outline why our solutions work, and show that a previous approach [10] is incorrect.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/711_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/711_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universidad Complutense de Madrid</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>05/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="pjmartin@sip.ucm.es">Pedro J. Martin</Author>
           <Author email="r.torres@fdi.ucm.es">Roberto Torres</Author>
           <Author email="agav@sip.ucm.es">Antonio Gavilanes</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/a254839r565r176p/?p=14c098d15b99440182ea9c774c8dc090&amp;pi=10">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword> Shortest path algorithms - GPU - CUDA,Pedro J. Martin,Roberto Torres,Antonio Gavilanes,pjmartin@sip.ucm.es,r.torres@fdi.ucm.es,agav@sip.ucm.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b6723f4e-8b3e-416f-a229-ba8f7fbb2334</GUID>
        <Name>Adaptative Resonance Theory Fuzzy Networks Parallel Computation Using CUDA</Name>
        <ShortDescription>Programming of Graphics Processing Units (GPUs) has evolved in a way they can be used to address and speed-up computation of algorithms exemplified by data-parallel models. In this paper parallelization of a Fuzzy ART algorithm is described and a detailed explanation of its implementation under CUDA is given. Experimental results show the algorithm runs up to 52 times faster on the GPU than on the CPU for testing and 18 times faster for training under specific conditions.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/710_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/710_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Valladolid</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>06/05/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">M. Martinez-Zarzuela</Author>
           <Author email="">F. J. Diaz Pernas</Author>
           <Author email="">A. Tejero de Pablos</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/16p881570465517p/?p=7cfbcf0b92034f1086bd018a3aceb6a4&amp;pi=9">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>M. Martinez-Zarzuela,F. J. Diaz Pernas,A. Tejero de Pablos</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>627e05b2-0728-4871-ad2f-108473423236</GUID>
        <Name>Accelerating Large Graph Algorithms on the GPU Using CUDA</Name>
        <ShortDescription>Large graphs involving millions of vertices are common in many practical applications and are challenging to process. Practical-time implementations using high-end computers are reported but are accessible only to a few. Graphics Processing Units (GPUs) of today have high computation power and low price. They have a restrictive programming model and are tricky to use. The G80 line of Nvidia GPUs can be treated as a SIMD processor array using the CUDA programming model. We present a few fundamental algorithms including breadth first search, single source shortest path, and all-pairs shortest path using CUDA on large graphs. We can compute the single source shortest path on a 10 million vertex graph in 1.5 seconds using the Nvidia 8800GTX GPU costing 600. In some cases optimal sequential algorithm is not the fastest on the GPU architecture. GPUs have great potential as high-performance co-processors.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/709_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/709_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>International Institute of Information Technology Hyderabad</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>01/22/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="harishpk@research.iiit.ac.in">Pawan Harish</Author>
           <Author email="pjn@iiit.ac.in">P. J. Narayanan</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/y4816x2q7475v93n/?p=7cfbcf0b92034f1086bd018a3aceb6a4&amp;pi=6">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Pawan Harish,P. J. Narayanan,harishpk@research.iiit.ac.in,pjn@iiit.ac.in</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>cef5a0e1-9512-4dd7-bf59-e0537bceb8f1</GUID>
        <Name>Molecular Dynamics Simulations on Commodity GPUs with CUDA</Name>
        <ShortDescription>Molecular dynamics simulations are a common and often repeated task in molecular biology. The need for speeding up this treatment comes from the requirement for large system simulations with many atoms and numerous time steps. In this paper we present a new approach to high performance molecular dynamics simulations on graphics processing units. Using modern graphics processing units for high performance computing is facilitated by their enhanced programmability and motivated by their attractive price/performance ratio and incredible growth in speed. To derive an efficient mapping onto this type of architecture, we have used the Compute Unified Device Architecture (CUDA) to design and implement a new parallel algorithm. This results in an implementation with significant runtime savings on an off-the-shelf computer graphics card.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/708_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/708_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nanyang Technological University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>01/22/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="liuweiguo@ntu.edu.sg">Weiguo Liu</Author>
           <Author email="bertil.schmidt@unsw.edu.au">Bertil Schmidt</Author>
           <Author email="asgerrit@ntu.edu.sg">Gerrit Voss</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/p106n8501059l077/?p=7cfbcf0b92034f1086bd018a3aceb6a4&amp;pi=5">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Weiguo Liu,Bertil Schmidt,Gerrit Voss,liuweiguo@ntu.edu.sg,bertil.schmidt@unsw.edu.au,asgerrit@ntu.edu.sg</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f891bba8-5e53-45f2-a171-f2fd2b1b02e2</GUID>
        <Name>Accelerating Cone Beam Reconstruction Using the CUDA-Enabled GPU</Name>
        <ShortDescription>Compute unified device architecture (CUDA) is a software development platform that enables us to write and run general-purpose applications on the graphics processing unit (GPU). This paper presents a fast method for cone beam reconstruction using the CUDA-enabled GPU. The proposed method is accelerated by two techniques: (1) off-chip memory access reduction; and (2) memory latency hiding. We describe how these techniques can be incorporated into CUDA code. Experimental results show that the proposed method runs at 82% of the peak memory bandwidth, taking 5.6 seconds to reconstruct a 5123-voxel volume from 360 5122-pixel projections. This performance is 18% faster than the prior method. Some detailed analyses are also presented to understand how effectively the acceleration techniques increase the reconstruction performance of a naive method.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/707_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/707_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Osaka University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>12/17/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="y-okitu@ist.osaka-u.ac.jp">Yusuke Okitsu</Author>
           <Author email="ino@ist.osaka-u.ac.jp">Fumihiko Ino</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/m56tt40822321470/?p=7cfbcf0b92034f1086bd018a3aceb6a4&amp;pi=4">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yusuke Okitsu,Fumihiko Ino,y-okitu@ist.osaka-u.ac.jp,ino@ist.osaka-u.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3ec2d6eb-a3ca-4435-9e66-c0dbe3e61938</GUID>
        <Name>Parallelization of a Video Segmentation Algorithm on CUDA Enabled Graphics Processing Units</Name>
        <ShortDescription>Nowadays, Graphics Processing Units (GPU) are emerging as SIMD coprocessors for general purpose computations, specially after the launch of nVIDIA CUDA. Since then, some libraries have been implemented for matrix computation and image processing. However, in real video applications some stages need irregular data distributions and the parallelism is not so inherent. This paper presents the parallelization of a video segmentation application on GPU hardware, which implements an algorithm for abrupt and gradual transitions detection. A critical part of the algorithm requires highly intensive computation for video frames features calculation. Results on three CUDA-enabled GPUs are encouraging, because of the significant speedup achieved. They are also compared with an OpenMP version of the algorithm, running on two platforms with multiples cores.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/706_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/706_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Cordoba / University of Malaga</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>08/22/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="el1goluj@uco.es">Juan Gomez-Luna</Author>
           <Author email="gonzalez@ac.uma.es">Jose Maria Gonzalez-Linares</Author>
           <Author email="el1bebej@uco.es">Jose Ignacio Benavides</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/d76622215h42m733/?p=7cfbcf0b92034f1086bd018a3aceb6a4&amp;pi=3">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Juan Gomez-Luna,Jose Maria Gonzalez-Linares,Jose Ignacio Benavides,el1goluj@uco.es,gonzalez@ac.uma.es,el1bebej@uco.es</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7628110e-6e5b-4952-9a5f-dbe69816046e</GUID>
        <Name>A CUDA-Supported Approach to Remote Rendering</Name>
        <ShortDescription>In this paper we present the utilization of advanced programming techniques on current graphics hardware to improve the performance of remote rendering for interactive applications. We give an overview of existing systems in remote rendering and focus on some general bottlenecks of remote visualization. Afterwards we describe current developments in graphics hardware and software and outline how they can be used to increase the performance of remote graphics systems. Finally we present some results and benchmarks to confirm the validity of our work.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/705_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/705_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Paderborn</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>11/22/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="slietsch@upb.de">Stefan Lietsch</Author>
           <Author email="marquard@upb.de">Oliver Marquardt</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/f556l0l4171q7qu8/?p=7cfbcf0b92034f1086bd018a3aceb6a4&amp;pi=2">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Stefan Lietsch,Oliver Marquardt,slietsch@upb.de,marquard@upb.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>50132356-0063-40d3-922b-8bc54e0ecb18</GUID>
        <Name>JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA</Name>
        <ShortDescription>A recent trend in mainstream desktop systems is the use of general-purpose graphics processor units (GPGPUs) to obtain order-of-magnitude performance improvements. CUDA has emerged as a popular programming model for GPGPUs for use by C/C++ programmers. Given the widespread use of modern object-oriented languages with managed runtimes like Java and C#, it is natural to explore how CUDA-like capabilities can be made accessible to those programmers as well. In this paper, we present a programming interface called JCUDA that can be used by Java programmers to invoke CUDA kernels. Using this interface, programmers can write Java codes that directly call CUDA kernels, and delegate the responsibility of generating the Java-CUDA bridge codes and host-device data transfer calls to the compiler. Our preliminary performance results show that this interface can deliver significant performance improvements to Java programmers. For future work, we plan to use the JCUDA interface as a target language for supporting higher level parallel programming languages like X10 and Habanero-Java.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/704_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/704_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Department of Computer Science</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>08/22/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="yanyh@rice.edu">Yonghong Yan</Author>
           <Author email="jmg3@rice.edu">Max Grossman</Author>
           <Author email="vsarkar@rice.edu">Vivek Sarkar</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/4282177525r45375/?p=7cfbcf0b92034f1086bd018a3aceb6a4&amp;pi=1">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yonghong Yan,Max Grossman,Vivek Sarkar,yanyh@rice.edu,jmg3@rice.edu,vsarkar@rice.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5a46aeed-d703-4e3f-a010-ccf692264df9</GUID>
        <Name>Training Recurrent Neural Network Using Multistream Extended Kalman Filter on Multicore Processor and Cuda Enabled Graphic Processor Unit</Name>
        <ShortDescription>Recurrent neural networks are popular tools used for modeling time series. Common gradient-based algorithms are frequently used for training recurrent neural networks. On the other side approaches based on the Kalman filtration are considered to be the most appropriate general-purpose training algorithms with respect to the modeling accuracy. Their main drawbacks are high computational requirements and difficult implementation. In this work we first provide clear description of the training algorithm using simple pseudo-language. Problem with high computational requirements is addresses by performing calculation on Multicore Processor and CUDA-enabled graphic processor unit. We show that important execution time reduction can be achieved by performing computation on manycore graphic processor unit.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/703_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/703_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Faculty of Informatics and Information Technologies</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>09/16/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="cernansky@fiit.stuba.sk">Michal Cernansky</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/m345820036657442/?p=7cfbcf0b92034f1086bd018a3aceb6a4&amp;pi=0">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Michal Cernansky,cernansky@fiit.stuba.sk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f9600d38-e1e3-40c9-bd98-fca07f782225</GUID>
        <Name>Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices</Name>
        <ShortDescription>In this paper, we propose an acceleration of collapsed variational Bayesian (CVB) inference for latent Dirichlet allocation (LDA) by using Nvidia CUDA compatible devices. While LDA is an efficient Bayesian multi-topic document model, it requires complicated computations for parameter estimation in comparison with other simpler document models, e.g. probabilistic latent semantic indexing, etc. Therefore, we accelerate CVB inference, an efficient deterministic inference method for LDA, with Nvidia CUDA. In the evaluation experiments, we used a set of 50,000 documents and a set of 10,000 images. We could obtain inference results comparable to sequential CVB inference.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/702_cover-medium_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/702_cover-medium_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nagasaki University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>06/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="masada@cis.nagasaki-u.ac.jp">Tomonari Masada</Author>
           <Author email="hamada@cis.nagasaki-u.ac.jp">Tsuyoshi Hamada</Author>
           <Author email="shibata@cis.nagasaki-u.ac.jp">Yuichiro Shibata</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/1618291uv6x82p77/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Tomonari Masada,Tsuyoshi Hamada,Yuichiro Shibata,masada@cis.nagasaki-u.ac.jp,hamada@cis.nagasaki-u.ac.jp,shibata@cis.nagasaki-u.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9396fec3-f4d1-419d-9345-537bb1e70f10</GUID>
        <Name>POSIX Threads and NVIDIA's CUDA</Name>
        <ShortDescription>The current progression of commodity processing architectures exhibits a trend toward increasing parallelism, requiring that undergraduate students in a wide range of technical disciplines gain an understanding of problem solving in massively parallel environments. However, as a small comprehensive college, we cannot currently afford to dedicate an entire semester-long course to the study of parallel computing. To combat this situation, we have integrated the key components of such a course into a 300-level course on modern operating systems. In this paper, we describe a parallel computing unit that is designed to dovetail with the discussion of process and thread management common to operating systems courses. We also
describe a set of self-contained projects in which students explore two parallel programming models, POSIX Threads and NVIDIA's Compute Unified Device Architecture, that enable parallel architectures to be utilized effectively. In our experience, this unit can be integrated with traditional operating systems topics quite readily, making parallel computing accessible to undergraduate students without requiring a full course dedicated to these increasingly important topics.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/701_mte_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/701_mte_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>ogf.org</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>12/31/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">ogf.org</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www2.gcc.edu/dept/comp/faculty/gribblecp/research/papers/gribble09introducing.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>ogf.org</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a23ac3b0-45a9-4784-a714-af9f875bd5cc</GUID>
        <Name>Open Inventor by VSG</Name>
        <ShortDescription>Open Inventor by VSG provides application developers with a unique solution that enables interoperability between advanced 3D visualization and powerful GPU-based computing capabilities to perform parallel computation on the fly on a workstation.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/700_vsg_logo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/700_vsg_logo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>VSG</OrganizationName>
        <OrganizationURL>http://www.vsg3d.com/vsg_prod_openinventor.php</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>12/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">VSG</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.vsg3d.com/vsg_prod_openinventor.php">Application</ContentType>
           <ContentType url="http://www.youtube.com/watch?v=wviPeF1XTsk">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Oil &amp; Gas</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>VSG</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>984915a6-7fd1-45fa-8b37-52fd8af92486</GUID>
        <Name>Mental Ray 3.8</Name>
        <ShortDescription>iray introduces a new way of utilizing photorealistic rendering, by integrating both preview and final frame rendering in one single interactive process. In addition, the power of the CUDA GPU dramatically shortens the processing time, introducing significant cost optimizations along the rendering pipeline. And the handling simplifications of iray provide a tool that enables professionals to focus on their core business, while still being able to generate beautiful photorealistic images of their works, all without the help of rendering experts and without the need of becoming rendering experts.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/699_logo_header_left_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/699_logo_header_left_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>mental images</OrganizationName>
        <OrganizationURL>http://www.mentalimages.com/index.php</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>12/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">mental images</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.mentalimages.com/products/mental-ray.html">Application</ContentType>
           <ContentType url="http://www.mentalimages.com/products/iray/introduction-to-iray.html?no_cache=1&amp;sword_list[0]=mental&amp;sword_list[1]=ray&amp;sword_list[2]=3.8">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>mental images</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0152d7ff-43b6-4f43-b884-582438d94a54</GUID>
        <Name>AxRTM</Name>
        <ShortDescription>Reverse Time Migration (RTM) is the current 'state-of-the-art' in seismic imaging. The strength of RTM stems from the fact that it fully respects the two-way acoustic wave equation, thus improving imaging in areas where complex geology violates the assumptions made in Kirchhoff or one-way wave equation migrations. Until recently, RTM's widespread use was severely hindered by the enormous computing resources required to process the data. This computational bottleneck is now cleared with Acceleware's patent-pending software solution AxRTM.
AxRTM provides the core numerical functionality of Reverse Time Migration as a library that can be integrated into an existing seismic processing framework. AxRTM has a modular architecture supporting a variety of integrator-supplied functionality, and currently supports both optimized multi-core CPU and NVIDIA GPU hardware.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/698_Seismic_velocity_model_sml_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/698_Seismic_velocity_model_sml_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Acceleware</OrganizationName>
        <OrganizationURL>http://www.acceleware.com/default</OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>01/01/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Acceleware</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.acceleware.com/default/index.cfm/our-products/oil-and-gas/rtm-solvers/">Application</ContentType>
           <ContentType url="http://www.acceleware.com/tasks/sites/default/assets/pdf/Acceleware_SEG2009_RTMonGPU.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Oil &amp; Gas</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Acceleware</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d9067cbb-646d-4ec7-ae1e-5302f65bed87</GUID>
        <Name>Linear Algebra Solvers and High Performance Computing</Name>
        <ShortDescription>Solving a system of linear equations is a common numerical technique applied in many fields including fluid dynamics, thermal analysis, mechanical simulations, and economics.  As simulations and models increase in complexity, organizations require high performance software to meet their growing computational needs.  Several widely available optimized versions of BLAS and LAPACK libraries have been written to take advantage of CPU architectures.  Recently, graphics processing units (GPUs) have shown potential to offer substantial performance gains when solving data-intensive calculations.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/697_Engine_Block_sml_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/697_Engine_Block_sml_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Acceleware</OrganizationName>
        <OrganizationURL>http://www.acceleware.com/default/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>05/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Acceleware</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.acceleware.com/default/index.cfm/solutions/matrix-solvers/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computer Aided Engineering</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Acceleware</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>16cf695b-a4de-4cd9-9aca-10b8ad8eff95</GUID>
        <Name>Unipro UGENE</Name>
        <ShortDescription>UGENE is free cross-platform bioinformatics toolkit. It works on Windows, Linux, Mac OS and has out of the box support for modern GPUs including NVIDIA CUDA. UGENE focuses on integration of highly optimized versions of the most popular bioinformatics algorithms (Smith Waterman, HMMER, MUSCLE, Phylip etc) within single flexible visual interface.</ShortDescription>
        <URL>http://ugene.unipro.ru</URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/680_ss_mac_h1n1_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/680_ss_mac_h1n1_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Unipro</OrganizationName>
        <OrganizationURL>http://unipro.ru/eng/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>07/15/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="ugene@unipro.ru">Unipro UGENE team</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://ugene.unipro.ru/">Application</ContentType>
           <ContentType url="http://ugene.unipro.ru/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Life Sciences</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Unipro UGENE team,ugene@unipro.ru</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8b638261-433f-4fc9-ad7b-96c2bd1a6599</GUID>
        <Name>Movavi Video Suite</Name>
        <ShortDescription>Movavi Video Suite is a complete collection of EIGHT powerful yet easy-to-use tools to suit your video processing needs</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/694_0000217475_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/694_0000217475_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Movavi</OrganizationName>
        <OrganizationURL>http://www.movavi.com/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>11/25/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">Movavi</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.movavi.com/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Movavi</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fa0fb82b-edb4-406b-bdcb-5b7d5e3eea51</GUID>
        <Name>Movavi Video Converter</Name>
        <ShortDescription>Movavi Video Converter is a leading video converter you can use to convert video &amp; audio, save for portables, rip &amp; burn DVD</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/695_vc9box_jr_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/695_vc9box_jr_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Movavi</OrganizationName>
        <OrganizationURL>http://www.movavi.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>11/24/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">Movavi</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.movavi.com/videoconverter/index13.html?gclid=CIXdgOnTwZ8CFSkZawodezPv0A">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Movavi</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a6e01b50-0329-4a90-85b3-597c398b8a63</GUID>
        <Name>PowerProducer 5</Name>
        <ShortDescription>PowerProducer connects your HDV camcorder to your creative side, with a complete range of Blu-ray Disc and DVD authoring features for producing discs of your videos.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/693_2eqevch_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/693_2eqevch_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Cyberlink</OrganizationName>
        <OrganizationURL>http://www.cyberlink.com/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>11/02/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>5</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">Cyberlink</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cyberlink.com/products/powerproducer/overview_en_US.html">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Cyberlink</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b4bfbf0c-7c02-49f3-9114-1bb621f4b3c7</GUID>
        <Name>HD NVR</Name>
        <ShortDescription>The HD NVR series network video recorder sets new standards for IP camera recorders featuring full 1080p HD video output with dual monitor capability and hardware video acceleration via Nvidia Cuda. Also features low power consumption with green hard drives up to 2TB, perfect for MegaPixel HD cameras. The wireless HD NVR is suitable for up to 16 network cameras and can be used in the home, office or for professional applications.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/691_nvr-header-2009_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/691_nvr-header-2009_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>BiKal IP CCTV</OrganizationName>
        <OrganizationURL>http://www.bikal.co.uk/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>10/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">BiKal IP CCTV</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.bikal.co.uk/network-video-recorder/nvr-pro.html">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>BiKal IP CCTV</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>54670403-5c1a-4f0a-b226-3c1ca3dd071d</GUID>
        <Name>EyeSoft</Name>
        <ShortDescription>EyeSoft is compatible with IP cameras and USB video devices from many different manufacturers including analogue video capture cards, alarm boxes and PTZ Keyboards. EyeSoft has an open source architecture allowing the integration of many hardware and software platforms and it's compatibility increases with each release.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/690_eyesoft-header2-2009_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/690_eyesoft-header2-2009_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>BiKal IP CCTV</OrganizationName>
        <OrganizationURL>http://www.bikal.co.uk/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>10/30/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">BiKal IP CCTV</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.bikal.co.uk/network-surveillance/compatibility.html">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>EyeSoft</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9ae6cb89-6d4c-4480-8021-f35668d44724</GUID>
        <Name>Loilo Touch</Name>
        <ShortDescription>Now enjoy video editing by simply touching the screen. Enjoy using your fingers directly on your video, picture, and music with your friends and family.
Extreme 10X output made possible with NVIDIA CUDA technology that enables GPU to take command for ultra fast video encode. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/689_touch_05_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/689_touch_05_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Loilo</OrganizationName>
        <OrganizationURL>http://loilo.tv</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>23</ReleaseDay>
        <ReleaseDateDisplay>10/23/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">Loilo</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://loilo.tv/product/5/desc">Application</ContentType>
           <ContentType url="http://loilo.tv/product/5/desc">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Loilo</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4db08198-2f0d-49dd-b9b2-6087c1c8b368</GUID>
        <Name>Mirics FlexiTV</Name>
        <ShortDescription>Mirics FlexiTVTM is a multi-standard broadcast TV receiver for netbooks, notebooks and desktop PCs. Using NVIDIAs CUDATM GPU acceleration technology for critical TV signal processing, global TV and radio can be received using FlexiTV. The result is a single hardware design for worldwide terrestrial TV and radio reception. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/688_mirics_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/688_mirics_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Mirics</OrganizationName>
        <OrganizationURL>http://www.mirics.com/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>10/02/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Mirics</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.nzone.com/object/nzone_flexitv_home.html">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mirics</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b7689d00-714b-455f-af55-9d1714974140</GUID>
        <Name>WinDVD 2010</Name>
        <ShortDescription>Kick it up a notch with HD! WinDVD Pro is a Blu-ray player that supports AVCHD and even upscales standard DVDs to near-HD quality for more intense movies and music. Includes everything in the Standard version, plus:
NVIDIA GPU-accelerated upscaling for smoother playback of your DVD-Video on high-definition display. Upscale DVD-video to fit your HD display, regardless of the platform!</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/687_images_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/687_images_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Corel</OrganizationName>
        <OrganizationURL>http://www.corel.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>09/10/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Corel</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.corel.com/servlet/Satellite/us/en/Product/1189528458632#versionTabview=tab1&amp;tabview=tab0">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Corel</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>7572a818-d387-4d19-b437-e2244aca5398</GUID>
        <Name>MilkyWay@home</Name>
        <ShortDescription>The goal of Milkyway@Home is to use the BOINC platform to harness volunteered computing resources in creating a highly accurate three dimensional model of the Milky Way galaxy using data gathered by the Sloan Digital Sky Survey. This project enables research in both astroinformatics and computer science.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/686_feed-248_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/686_feed-248_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>MilkyWay@home</OrganizationName>
        <OrganizationURL>http://milkyway.cs.rpi.edu/milkyway/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>08/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">MilkyWay@home</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://milkyway.cs.rpi.edu/milkyway/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>MilkyWay@home</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a4cd88a8-6691-4923-83b0-a03b9a3f6e2b</GUID>
        <Name>Roxio Creator 2010</Name>
        <ShortDescription>With Creator 2010, you can render and encode your video 5 times faster thanks to NVIDIA Cuda technologies.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/685_creator2010-box-lg_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/685_creator2010-box-lg_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Roxio</OrganizationName>
        <OrganizationURL>http://www.roxio.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>08/25/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>5</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">Roxio</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.roxio.com/enu/products/creator/suite/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Roxio</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>bd09a954-61de-4510-bf7d-ea219b830f78</GUID>
        <Name>DivideFrame GPU Decoder</Name>
        <ShortDescription>Hardware accelerated decoding of AVCHD/Quicktime h.264 files for NLEs</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/683_logo_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/683_logo_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>DivideFrame</OrganizationName>
        <OrganizationURL>http://www.divideframe.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>07/31/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">DivideFrame</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.divideframe.com/?p=gpudecoder">Application</ContentType>
           <ContentType url="http://www.divideframe.com/?p=gpudecoder">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>DivideFrame</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>135ad008-3035-4f9c-943a-af31b3302a2f</GUID>
        <Name>Nero Moveit</Name>
        <ShortDescription>Nero Move it lets you convert and transfer all your multimedia files to the most popular portable and mobile devices. Easily transfer your MP3, WMA, and other audio and video files to your choice of device, PC, Mobile Phone, Digital Camera and more. Move It converts quickly and hassle-free from any supported source and from online communities, and easily move them to iPod, iPhone, PSP and other mobile devices or online communities such as Blackberry, LG, Xbox, YouTube and more. With integrated NVIDIA CUDA technology in Nero Move it lets users with compatible NVIDIA graphics cards convert their favorite videos faster and more efficiently. 

</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/682_featured-product-moveit-eng_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/682_featured-product-moveit-eng_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Nero</OrganizationName>
        <OrganizationURL>http://www.nero.com/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>04/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">Nero</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.nero.com/enu/moveit-introduction.html">Application</ContentType>
           <ContentType url="http://www.nero.com/enu/moveit-video-demo.html">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Nero</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>cb39210c-23de-464c-ac39-a27c3ea748d6</GUID>
        <Name>Elcomsoft Wireless Security Auditor</Name>
        <ShortDescription>Elcomsoft Wireless Security Auditor allows network administrators to verify how secure a companys wireless network is by executing an audit of accessible wireless networks. Featuring patent-pending cost-efficient GPU acceleration technologies, Elcomsoft Wireless Security Auditor attempts to recover the original WPA/WPA2-PSK text passwords in order to test how secure your wireless environment is.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/681_ewsa_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/681_ewsa_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Elcomsoft</OrganizationName>
        <OrganizationURL>http://www.elcomsoft.com/ewsa.html</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>01/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="">Elcomsoft</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.elcomsoft.com/ewsa.html">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Wireless Security,Elcomsoft</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0465f26d-a230-406e-8c08-9cb481f02fab</GUID>
        <Name>Wave Tomography</Name>
        <ShortDescription>2D time-domain waveform tomography reconstruction algorithm using GPUs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/680_cuda_website_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/680_cuda_website_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>EPFL</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>01/20/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="olivier.roy@usense.org">Olivier Roy</Author>
           <Author email="olivier.roy@usense.org">Ivana Jovanovic</Author>
           <Author email="olivier.roy@usense.org">Reza Parhizkar</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.usense.org/software/wavetomography">Paper</ContentType>
           <ContentType url="http://www.usense.org/software/wavetomography">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>acoustic wave equation, inverse problems, waveform tomography,Olivier Roy,olivier.roy@usense.org</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>069ab305-5868-4d1b-8713-abf1f7dfd1ef</GUID>
        <Name>A performance study of general-purpose applications on graphics processors using CUDA</Name>
        <ShortDescription>Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of general-purpose applications compared to contemporary general-purpose processors (CPUs). This paper uses NVIDIAs C-like CUDA language and an engineering sample of their recently introduced GTX 260 GPU to explore the effectiveness of GPUs for a variety of application types, and describes some specific coding idioms that improve their performance on the GPU. GPU performance is compared to both single-core and multicore CPU performance, with multicore CPU implementations written using OpenMP. The paper also discusses advantages and inefficiencies of the CUDA programming model and some desirable features that might allow for greater ease of use and also more readily support a larger body of applications.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/679_pyramid_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/679_pyramid_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Virginia, Department of Computer Science, Charlottesville, VA</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>03/02/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>6</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="sc5nf@cs.virginia.edu">Shuai Che</Author>
           <Author email="jm6dg@cs.virginia.edu">Michael Boyer</Author>
           <Author email="jws9c@cs.virginia.edu">Jiayuan Meng</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cs.virginia.edu/~jm6dg/papers/GPGPU_workshop_CUDA.pdf=10&amp;md5=f62cf33884f5ec62c707d884bcb81608">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Shuai Che,Michael Boyer,Jiayuan Meng,sc5nf@cs.virginia.edu,jm6dg@cs.virginia.edu,jws9c@cs.virginia.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>254535ef-fa6c-46f1-a198-5a49c6deecbb</GUID>
        <Name>Fast N-Body Simulation with CUDA</Name>
        <ShortDescription>An N-body simulation numerically approximates the evolution of a system of bodies in which each body continuously interacts with every other body. A familiar example is an astrophysical simulation in which each body represents a galaxy or an individual star, and the bodies attract each other through the gravitational force, as in Figure 31-1. N-body simulation arises in many other computational science problems as well. For example, protein folding is studied using N-body simulation to calculate electrostatic and van der Waals forces. Turbulent fluid flow simulation and global illumination computation in
computer graphics are other examples of problems that use N-body simulation.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/678_n-body_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/678_n-body_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>NVIDIA Corporation / University of North Carolina at Chapel Hill</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>12/31/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Lars Nyland</Author>
           <Author email="">Mark Harris</Author>
           <Author email="">Jan Prins</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://wwwx.cs.unc.edu/~prins/Classes/633/Readings/nbody_gems3_ch31.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Lars Nyland,Mark Harris,Jan Prins</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5a75ea20-e19b-4d31-8ee4-cb07bd0cca4d</GUID>
        <Name>Optimization principles and application performance evaluation of a multithreaded GPU using CUDA</Name>
        <ShortDescription>GPUs have recently attracted the attention of many application developers as commodity data-parallel coprocessors. The newest generations of GPU architecture provide easier programmability and increased generality while maintaining the tremendous memory bandwidth and computational power of traditional GPUs. This opportunity should redirect efforts in GPGPU research from ad hoc porting of applications to establishing principles and strategies that allow efficient mapping of computation to graphics hardware. In this work we discuss the GeForce 8800 GTX processors organization, features, and generalized optimization strategies. Key to performance on this platform is using massive multithreading to utilize the large number of cores and hide global memory latency. To achieve this, developers face the challenge of striking the right balance between each threads resource usage and the number of simultaneously active threads. The resources to manage include the number of registers and the amount of on-chip memory used per thread, number of threads per multiprocessor, and global memory bandwidth. We also obtain increased performance by reordering accesses to off-chip memory to combine requests to the same or contiguous memory locations and apply classical optimizations to reduce the number of executed operations. We apply these strategies across a variety of applications and domains and achieve between a 10.5X to 457X speedup in kernel codes and between 1.16X to 431X total application speedup.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/677_cover_thumb_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/677_cover_thumb_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Illinois at Urbana-Champaign / NVIDIA Corporation</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>12/31/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>457</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Shane Ryoo</Author>
           <Author email="">Christopher I. Rodrigues</Author>
           <Author email="">Sara S. Baghsorkhi</Author>
           <Author email="">Sam S. Stone</Author>
           <Author email="">Wen-mei W. Hwu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://portal.acm.org/citation.cfm?id=1345220">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Shane Ryoo,Christopher I. Rodrigues,Sara S. Baghsorkhi, Sam S. Stone, Wen-mei W. Hwu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>bb1d56b5-acb4-41ec-a177-b2c2ab424e0f</GUID>
        <Name>CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment</Name>
        <ShortDescription>Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/676_1471-2105-9-S2-S10-1_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/676_1471-2105-9-S2-S10-1_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>CRIBI, University of Padova / Elaide, Srl, Padova</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>03/26/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>30</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Svetlin A Manavski</Author>
           <Author email="">Giorgio Valle</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.biomedcentral.com/1471-2105/9/S2/S10">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Svetlin A Manavski,Giorgio Valle</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>66e597cc-b024-471c-a2ae-ab091eb6f738</GUID>
        <Name>Speeding up Mutual Information Computation Using NVIDIA CUDA Hardware</Name>
        <ShortDescription>We present an efficient method for mutual information
(MI) computation between images (2D or 3D) for NVIDIAs (CUDA) compatible devices. Efficient parallelization of MI is particularly challenging on a (GPU) due to the need for histogram-based calculation of joint and marginal probability mass functions (pmfs) with large number of bins. The data-dependent (unpredictable) nature of the updates to the histogram, together with hardware limitations of the GPU (lack of synchronization primitives and limited memory caching mechanisms) can make GPU-based computation inefficient. To overcome these limitation, we approximate the pmfs, using a down-sampled version of the jointhistogram which avoids memory update problems. Our CUDA implementation improves the efficiency of MI calculations by a factor of 25 compared to a standard CPUbased implementation and can be used in MI-based image registration applications.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/675_comparison_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/675_comparison_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>The Australian National University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>12/31/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>25</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ramtin.shams@anu.edu.au">Ramtin Shams</Author>
           <Author email="nick.barnes@nicta.com.au">Nick Barnes</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://users.rsise.anu.edu.au/~nmb/papers/DICTACUDA2007.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ramtin Shams,Nick Barnes,ramtin.shams@anu.edu.au,nick.barnes@nicta.com.au</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d3a7926e-9a32-4fbf-abe5-9c2d26d38adc</GUID>
        <Name>Efficient Histogram Algorithms for NVIDIA CUDA Compatible Devices</Name>
        <ShortDescription>We present two efficient histogram algorithms designed for NVIDIAs compute unified device architecture (CUDA)compatible graphics processor units (GPUs). Our algorithm can be used for parallel computation of histograms on large data-sets and for thousands of bins. Traditionally histogram computation has been difficult and inefficient on the GPU. This often means that GPU-based implementation of the algorithms that require
histogram calculation as part of their computation, require to transfer data between the GPU and the host memory, which can be a significant bottleneck. Our algorithms remove the need for such costly data transfers by allowing efficient histogram calculation on the GPU. We show that the speed of histogram calculations can be improved by up to 30 times compared to a CPU-based implementation.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/674_ParaviewHistogram_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/674_ParaviewHistogram_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>The Australian National University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2007</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>12/01/2007</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>30</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Ramtin Shams</Author>
           <Author email="">A. Kennedy</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://users.rsise.anu.edu.au/~ramtin/papers/2007/ICSPCS_2007.pdf">Paper</ContentType>
           <ContentType url="http://users.rsise.anu.edu.au/~ramtin/cuda.htm">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ramtin Shams,A. Kennedy</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>62eb8c85-57f7-4ab2-b7db-6b5e0aab23f9</GUID>
        <Name>gpuCuller</Name>
        <ShortDescription>gpuCuller is a software library implementing parallel computation of view frustum culling for multiple view frustum and multiple entities (for now, AABB) Its main application is to compute visible elements for autonomous agents in VR simulation platforms the library builds up a BVH from the universe entities, which is parsed during culling operations</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/673_325px-View_frustum_culling.svg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/673_325px-View_frustum_culling.svg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>UTBM</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>01/14/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="nicolas.said@gmail.com">Nicolas Said</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/gpufrustum">Application</ContentType>
           <ContentType url="http://www.youtube.com/watch?v=CftiJWt_R0M">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Nicolas Said,nicolas.said@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>00bed0a7-7d59-4f06-b773-fbcb33358272</GUID>
        <Name>Flow visualization and flow cytometry with holographic video microscopy</Name>
        <ShortDescription>CUDA-accelerated analysis of holographic images yields the three-dimensional position of colloidal spheres with nanometer resolution, and simultaneously yields each spheres radius and complex refractive index with part-per-thousand resolution.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/672_img19_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/672_img19_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>New York University</OrganizationName>
        <OrganizationURL>http://physics.nyu.edu/grierlab/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>07/16/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>20</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="david.grier@nyu.edu">David G. Grier</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-17-15-13071">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>David G. Grier,david.grier@nyu.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b9d646ab-1b8e-42dc-b7c0-5033a2c01fe8</GUID>
        <Name>GPU acceleration of object classification algorithms using NVIDIA CUDA </Name>
        <ShortDescription>The field of computer vision has become an important part of today's society, supporting crucial applications in the medical, manufacturing, military intelligence and surveillance domains. Many computer vision tasks can be divided into fundamental steps: image acquisition, pre-processing, feature extraction, detection or segmentation, and high-level processing. This work focuses on classification and object detection, specifically k-Nearest Neighbors, Support Vector Machine classification, and Viola &amp; Jones object detection. Object detection and classification algorithms are computationally intensive, which makes it difficult to perform classification tasks in real-time. This thesis aims in overcoming the processing limitations of the above classification algorithms by offloading computation to the graphics processing unit (GPU) using NVIDIA's Compute Unified Device Architecture (CUDA). The primary focus of this work is the implementation of the Viola and Jones object detector in CUDA. A multi-GPU implementation provides a speedup ranging from 1x to 6.5x over optimized OpenCV code for image sizes of 300 x 300 pixels up to 2900 x 1600 pixels while having comparable detection results. The second part of this thesis is the implementation of a multi-GPU multi-class SVM classifier. The classifier had the same accuracy as an identical implementation using LIBSVM with a speedup ranging from 89x to 263x on the tested datasets. The final part of this thesis was the extension of a previous CUDA k-Nearest Neighbor implementation by exploiting additional levels of parallelism. These extensions provided a speedup of 1.24x and 2.35x over the previous CUDA implementation. As an end result of this work, a library of these three CUDA classifiers has been compiled for use by future researchers.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/671_grouping_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/671_grouping_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Rochester Institute of Technology</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>09/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>263</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Jesse Patrick Harvey</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="https://ritdml.rit.edu/bitstream/handle/1850/10894/35445_pdf_00B0B24A-DFD8-11DE-9A30-D21AD352ABB1.pdf?sequence=1">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jesse Patrick Harvey</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>59686a98-5e65-42d9-9044-02706ad3d148</GUID>
        <Name>Motion estimation for H.264/AVC on multiple GPUs using NVIDIA CUDA</Name>
        <ShortDescription>To achieve the high coding efficiency the H.264/AVC standard offers, the encoding process quickly becomes computationally demanding. One of the most intensive encoding phases is motion estimation. Even modern CPUs struggle to process high-definition video sequences in real-time. While personal computers are typically equipped with powerful Graphics Processing Units (GPUs) to accelerate graphics operations, these GPUs lie dormant when encoding a video sequence. Furthermore, recent developments show more and more computer configurations come with multiple GPUs. However, no existing GPU-enabled motion estimation architectures target multiple GPUs. In addition, these architectures provide no early-out behavior nor can they enforce a specific processing order. We developed a motion search architecture, capable of executing motion estimation and partitioning for an H.264/AVC sequence entirely on the GPU using the NVIDIA CUDA (Compute Unified Device Architecture) platform. This paper describes our architecture and presents a novel job scheduling system we designed, making it possible to control the GPU in a flexible way. This job scheduling system can enforce real-time demands of the video encoder by prioritizing calculations and providing an early-out mode. Furthermore, the job scheduling system allows the use of multiple GPUs in one computer system and efficient load balancing of the motion search over these GPUs. This paper focuses on the execution speed of the novel job scheduling system on both single and multi-GPU systems. Initial results show that real-time full motion search of 720p high-definition content is possible with a 32 by 32 search window running on a system with four GPUs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/670_h264_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/670_h264_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>The International Society for Optical Engineering</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>09/02/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Bart Pieters</Author>
           <Author email="">Charles F. Hollemeersch</Author>
           <Author email="">Peter Lambert</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://spiedl.aip.org/getabs/servlet/GetabsServlet?prog=normal&amp;id=PSISDG00744300000174430X000001&amp;idtype=cvips&amp;gifs=yes&amp;ref=no">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Bart Pieters,Charles F. Hollemeersch,Peter Lambert</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3f523b1a-9c95-4ffa-a003-90314020aede</GUID>
        <Name>Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation</Name>
        <ShortDescription>In this paper, we propose an acceleration of collapsed variational Bayesian (CVB) inference for latent Dirichlet allocation (LDA) by using Nvidia CUDA compatible devices. While LDA is an efficient Bayesian multi-topic document model, it requires complicated computations for parameter estimation in comparison with other simpler document models, e.g. probabilistic latent semantic indexing, etc. Therefore, we accelerate CVB inference, an efficient deterministic inference method for LDA, with Nvidia CUDA. In the evaluation experiments, we used a set of 50,000 documents and a set of 10,000 images. We could obtain inference results comparable to sequential CVB inference.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/669_lncs_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/669_lncs_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nagasaki University, Bunkyo-machi, Nagasaki, Japan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>26</ReleaseDay>
        <ReleaseDateDisplay>06/26/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Tomonari Masada</Author>
           <Author email="">Tsuyoshi Hamada</Author>
           <Author email="">Yuichiro Shibata</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/1618291uv6x82p77/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Tomonari Masada,Tsuyoshi Hamada,Yuichiro Shibata</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>54d853fe-6790-40a4-84a1-d7bfadeaa979</GUID>
        <Name>Real-time 2D parallel windowed Fourier transform for fringe pattern analysis using GPUs</Name>
        <ShortDescription>In optical interferometers, fringe projection systems, and synthetic aperture radars, fringe patterns are common outcomes and usually degraded by unavoidable noises. The presence of noises makes the phase extraction and phase unwrapping challenging. Windowed Fourier transform (WFT) based algorithms have been proven to be effective for fringe pattern analysis to various applications. However, the WFT-based algorithms are computationally expensive, prohibiting them from real-time applications. In this paper, we propose a fast parallel WFT-based library using graphics processing units and computer unified device architecture. Real-time WFT-based algorithms are achieved with 4 frames per second in processing 256x256 fringe patterns. Up to 132x speedup is obtained for WFT-based algorithms using NVIDIA GTX295 graphics card than sequential C in quad-core 2.5GHz Intel(R)Xeon(R) CPU E5420.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/668_rt2d_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/668_rt2d_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Nanyang Technological University, Singapore</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>12/02/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>132</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="mkmqian@ntu.edu.sg">Wenjing Gao</Author>
           <Author email="">Nguyen Thi</Author>
           <Author email="">Ho Sy Loi</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.opticsinfobase.org/abstract.cfm?URI=oe-17-25-23147">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Wenjing Gao,Nguyen Thi,Ho Sy Loi,mkmqian@ntu.edu.sg</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>1ac32591-b05b-4a98-9570-fbdd6347751c</GUID>
        <Name>Solve MAX-SAT problem using nVidia CUDA framework</Name>
        <ShortDescription>General Purpose computing over Graphical Processing Units (GPGPUs) is a huge shift of paradigm in parallel computing that promises a dramatic increase in performance. But GPGPUs also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges and design choices involved in parallelizing a hybrid of Genetic Algorithm (GA) and Local Search (LS) to solve MAXimum SATisfiability (MAX-SAT) problem on a state-of-the-art nVidia Tesla GPU using nVidia Compute Unified Device Architecture (CUDA). MAX-SAT is a problem of practical importance and is often solved by employing metaheuristics based search methods like GAs and hybrid of GA with LS. Almost all the parallel GAs (pGAs) designed in the last two decades were designed for either clusters or MPPs. Unfortunately, very little research is done on the implementation of such algorithms over commodity graphics hardware. GAs in their simple form are not suitable for implementation over the Single Instruction Multiple Thread (SIMT) architecture of a GPU, and the same is the case with conventional LS algorithms. In this paper we explore different genetic operators that can be used for an efficient implementation of GAs over nVidia GPUs. We also design and introduce new techniques/operators for an efficient implementation of GAs and LS over such architectures. We use nVidia Tesla C1060 to perform several numerical tests and performance measurements and show that in the best case we obtain a speedup of 25x. We also discuss the effects of different optimization techniques on the overall execution time. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/667_cover-medium_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/667_cover-medium_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Hokkaido University, Sapporo, Japan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>10/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="asim@uva.cims.hokudai.ac.jp">Asim Munawar</Author>
           <Author email="wahibium@uva.cims.hokudai.ac.jp">Mohamed Wahib</Author>
           <Author email="munetomo@iic.hokudai.ac.jp">Masaharu Munetomo</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.springerlink.com/content/m1272275232xlh06/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Asim Munawar,Mohamed Wahib,Masaharu Munetomo,asim@uva.cims.hokudai.ac.jp,wahibium@uva.cims.hokudai.ac.jp,munetomo@iic.hokudai.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ff558f05-2894-4e4e-ae78-2c0a67925404</GUID>
        <Name>Scalable computation for spatially scalable video coding using NVIDIA CUDA and multi-core CPU</Name>
        <ShortDescription>The scalable video coding (SVC), an extension of H.264/MPEG4-AVC (H.264), was standardized in 2007 by Joint Video Team (JVT). SVC provides spatial, temporal and SNR scalabilities. To achieve these scalabilities, SVC uses additional coding tools and coding modes based on H.264. The coding tools used by SVC and the variety coding modes decision make the corresponding coding complexity become extremely high, so real-time realization of SVC is nearly impossible by using software and single-core CPU only. One possible solution to generate SVC streams in real-time is to parallelize the whole encoding process. Currently, multi-core CPU and GPU are two popular kinds of parallel processing architectures. Not much research has been devoted to realize the parallel SVC encoders based on the co-work of these two architectures. In this paper, a scalable computation model for spatial SVC using multi-core CPU and GPGPU through NVIDIA CUDA is proposed. On the basis of the proposed computational model, a solution to solve the challenging data transition problem (will be detailed later) of this CPU-GPU co-work architecture is then provided. Simulation results show that, through our work, significant speed up gain in spatial SVC encoding can be achieved.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/666_cover_thumb_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/666_cover_thumb_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>ACM</OrganizationName>
        <OrganizationURL>http://www.acm.org/publications</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>01/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yen-Lin Huang</Author>
           <Author email="">Yun-Chung Shen</Author>
           <Author email="">Ja-Ling Wu</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://portal.acm.org/citation.cfm?id=1631323">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yen-Lin Huang,Yun-Chung Shen,Ja-Ling Wu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e1f0d8e5-d679-4bd3-a4b7-3a915b9daed5</GUID>
        <Name>Canny edge detection on NVIDIA CUDA</Name>
        <ShortDescription>The Canny edge detector is a very popular and effective edge feature detector that is used as a pre-processing step in many computer vision algorithms. It is a multi-step detector which performs smoothing and filtering, non-maxima suppression, followed by a connected-component analysis stage to detect ldquotruerdquo edges, while suppressing ldquofalserdquo non edge filter responses. While there have been previous (partial) implementations of the Canny and other edge detectors on GPUs, they have been focussed on the old style GPGPU computing with programming using graphical application layers. Using the more programmer friendly CUDA framework, we are able to implement the entire Canny algorithm. Details are presented along with a comparison with CPU implementations. We also integrate our detector in to MATLAB, a popular interactive simulation package often used by researchers. The source code will be made available as open source. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/665_img_0535_sobel_thumb_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/665_img_0535_sobel_thumb_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Perceptual Interfaces &amp; Reality Lab., Maryland, Univ</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>07</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>07/15/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yuancheng Luo</Author>
           <Author email="">Duraiswami, R</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4563088">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yuancheng Luo,Duraiswami, R</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b972ec14-5548-49b5-8366-31d9df19eaf8</GUID>
        <Name>Sugarscape Cuda</Name>
        <ShortDescription>Using emergent programing techniques on the GPU we have made an implementation of sugarscape to utilize the massively parallel architecture of modern GPUs. Agents within the model move optimally within their vision which is uniformly set between 1,10. Multiple agent cannot occupy the same cell. The agents also interact with the sugar patches uniformly given a metabolism between [0.1,1). The sugar patches grow at a constant rate of 0.1 per time step until they reach their maximum values which are determined by two Gaussian functions. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/663_logo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/663_logo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>code.google.com</OrganizationName>
        <OrganizationURL>http://code.google.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>08/21/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Devm</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/sugarscape-cuda/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Devm</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ae892094-5c92-4cd2-b782-fc12cb86f174</GUID>
        <Name>Cuda Nash</Name>
        <ShortDescription>Finding Nash equilibria for large games is a computationally difficult task. The goal of this project is to implement a simple algorithm that is well suited to being run in parallel on simple hardware. The algorithm boils down solving a large system of differential equations until they converge within a given tolerance. We believe that the computational architecture of graphics cards is especially well suited to this type of problem.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/662_defaultlogo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/662_defaultlogo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>CUDA Developer</OrganizationName>
        <OrganizationURL>http://code.google.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>06/03/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Aultman Stephen</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/cuda-nash/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Aultman Stephen</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a58dcb4d-07ea-415c-8141-760465ce3812</GUID>
        <Name>Electromag with CUDA</Name>
        <ShortDescription>Fun electromagnetism simulation application with CUDA GPGPU acceleration</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/661_defaultlogo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/661_defaultlogo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>CUDA Developer</OrganizationName>
        <OrganizationURL>http://code.google.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>05/08/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email=""></Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/electromag-with-cuda/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword></Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e128489b-f5c9-4fb5-9e20-e6f08d8d3cd7</GUID>
        <Name>Hydrazine</Name>
        <ShortDescription>A library of common operations needed for C++ and CUDA development </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/660_defaultlogo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/660_defaultlogo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>CUDA Developer</OrganizationName>
        <OrganizationURL>http://code.google.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>13</ReleaseDay>
        <ReleaseDateDisplay>05/13/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Gregory</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/hydrazine/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Libraries</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Gregory</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>6e7508ba-012a-4545-9c73-97e646edae15</GUID>
        <Name>CUDA  Grayscale</Name>
        <ShortDescription>This project presents a common technique for converting colored images to their grayscale representation using CUDA enabled GPUs to speed up processing. 
This multi-platform implementation uses OpenCV for managing image files, while the conversion algorithm takes into consideration different weighting of the color channels for a more effective representation of the colored image.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/659_defaultlogo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/659_defaultlogo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>CUDA Developer</OrganizationName>
        <OrganizationURL>http://code.google.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>11/16/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Karl Phillip</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/cuda-grayscale/">Application</ContentType>
           <ContentType url="http://code.google.com/p/cuda-grayscale/downloads/list">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Karl Phillip</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d84046cb-3498-46af-a187-293a84bbad65</GUID>
        <Name>CUDA Ndarray</Name>
        <ShortDescription>This project provides a type with an interface as similar as possible to numpy's ndarray whose storage is allocated on a GPU device. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/659_defaultlogo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/659_defaultlogo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>CUDA Developer</OrganizationName>
        <OrganizationURL>http://code.google.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>18</ReleaseDay>
        <ReleaseDateDisplay>12/18/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">James Bergstra</Author>
           <Author email="">Frederic Bastien</Author>
           <Author email="">Pascal Lamblin</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/cuda-ndarray/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Libraries</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>James Bergstra,Frederic Bastien,Pascal Lamblin</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>ad2ef9c8-ad67-4c51-929b-c36141afa1c7</GUID>
        <Name>multisvm</Name>
        <ShortDescription>The scaling of serial algorithms cannot rely on the improvement of CPUs anymore. The performance of classical Support Vector Machine (SVM) implementations has reached its limit and the arrival of the multi core era requires these algorithms to adapt to a new parallel scenario. Graphics Processing Units (GPU) have arisen as high performance platforms to implement data parallel algorithms. In this project, it is described how a naive implementation of a multiclass classifier based on SVMs can map its inherent degrees of parallelism to the GPU programming model and efficiently use its computational throughput. Empirical results show that the training and classification time of the algorithm can be reduced an order of magnitude compared to a classical solver, LIBSVM, while guaranteeing the same accuracy. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/657_logo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/657_logo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>CUDA Developer</OrganizationName>
        <OrganizationURL>http://code.google.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>11/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Sergherr</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/multisvm/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sergherr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9b79ff8f-cc43-43bf-8143-37759f811994</GUID>
        <Name>gpuocelot</Name>
        <ShortDescription>Ocelot is a dynamic compilation framework for heterogeneous systems, accomplishing this by providing various backend targets for CUDA programs. Ocelot currently allows CUDA programs to be executed on NVIDIA GPUs and x86-CPUs at full speed without recompilation. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/656_logo_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/656_logo_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>CUDA Developer</OrganizationName>
        <OrganizationURL>http://code.google.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>12/15/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Gregory</Author>
           <Author email="">Arkerr</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/gpuocelot/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Gregory,Arkerr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>500a4c45-722d-4593-9b33-c78d5247013e</GUID>
        <Name>PHENOTYPING RODENT MODELS OF OBESITY USING MAGNETIC RESONANCE IMAGING</Name>
        <ShortDescription>The emergence of dedicated, small animal imaging systems provides an excellent opportunity to study obesity using the rat and mouse models which will be critical to increasing our basic knowledge as well as deriving new treatments. MRI is well suited for quantifying fat depots (e.g., visceral, subcutaneous, hepatic, muscular) and for helping to determine the role of genetic, environmental, and therapeutic factors on lipid accumulation, metabolism, and disease. Assessment of lipid depots is important because of the linkage of visceral and ectopic depots to insulin resistance, vascular disease, etc. The importance of making reproducible imaging measurements can never be underestimated when conducting a study of many animals, and we demonstrated that ratio imaging enables reliable quantification even on a human clinical 1.5T MRI scanner. Scan-rescan variability and intra-operator variability were each reduced to a 2% coefficient of variation or less when the semi-automatic ratio image analysis was used. Receiver coil signal intensity inhomogeneity of over 200% across the field of view was flattened to less than 3% variation by ratio imaging. Using the SHR/SHROB rat model of dietary and genetic obesity, we found a novel image phenotype which showed that visceral adipose tissue depots are increased in both genetic and dietary obesity, but subcutaneous adipose tissue is uniquely linked to dietary obesity, at least in this model.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/655_rat_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/655_rat_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Department of Biomedical Engineering Case Western Reserve University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>01/01/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>21</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">David Hervert Johnson</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://etd.ohiolink.edu/etd/send-pdf.cgi/Johnson%20David%20Herbert.pdf?acc_num=case1250086728">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>DAVID HERBERT JOHNSON</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9ae40a45-facb-4779-9512-9bbf25f875c4</GUID>
        <Name>Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs</Name>
        <ShortDescription>We present a performance model-driven framework for automated performance tuning (autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics processing units (GPU). Our study consists of two parts. First, we describe several carefully hand-tuned SpMV implementations for GPUs, identifying key GPU-specific performance limitations, enhancements, and tuning opportunities. These implementations, which include variants on classical blocked compressed sparse row (BCSR) and blocked ELLPACK (BELLPACK)
storage formats, match or exceed state-of-the-art implementations. For instance, our best BELLPACK implementation achieves up to 29.0 Gflop/s in single-precision and 15.7 Gflop/s in doubleprecision
on the NVIDIA T10P multiprocessor (C1060), enhancing prior state-of-the-art unblocked implementations (Bell and Garland, 2009) by up to 1.8x and 1.5x for single and doubleprecision respectively.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/654_threadblock_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/654_threadblock_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Georgia Institute of Technology / Indian Institute of Technology Roorkee</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2010</ReleaseYear>
        <ReleaseMonth>1</ReleaseMonth>
        <ReleaseDay>1</ReleaseDay>
        <ReleaseDateDisplay>1/1/2010</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="jee@ece.gatech.edu">Jee W. Choi</Author>
           <Author email="amiksuec@iitr.ernet.in">Amik Singh</Author>
           <Author email="richie@cc.gatech.edu">Richard W. Vuduc</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://vuduc.org/pubs/choi2010-gpu-spmv.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jee W. Choi,Amik Singh,Richard W. Vuduc,jee@ece.gatech.edu,amiksuec@iitr.ernet.in,richie@cc.gatech.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d660b5b0-780e-42d3-9fc6-b4bb78acefde</GUID>
        <Name>Real-time display on Fourier domain optical coherence tomography system</Name>
        <ShortDescription>Fourier domain optical coherence tomography (FD-OCT) requires resampling of spectrally resolved depth information from wavelength to wave number, and the subsequent application of the inverse Fourier transform. The display rates of OCT images are much slower than the image acquisition rates due to processing speed limitations on most computers. We demonstrate a real-time display of processed OCT images using a linear-in-wave-number (linear-k) spectrometer and a graphics processing unit (GPU). We use the linear-k spectrometer with the combination of a diffractive grating with 1200 lines/mm and a F2 equilateral prism in the 840-nm spectral region to avoid calculating the resampling process. The calculations of the fast Fourier transform (FFT) are accelerated by the GPU with many stream processors, which realizes highly parallel processing. A display rate of 27.9 frames/sec for processed images (2048 FFT sizex1000 lateral A-scans) is achieved in our OCT system using a line scan CCD camera operated at 27.9 kHz</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/653_060506_1-V1_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/653_060506_1-V1_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Graduate School of Science and Engineering, Yamagata University</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>12/28/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ywata@yz.yamagata-u.ac.jp">Yuuki Watanabe</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://dx.doi.org/10.1117/1.3275463.1">Multimedia</ContentType>
           <ContentType url="http://www.octnews.org/articles/1737801/real-time-display-on-fourier-domain-optical-cohere/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Life Sciences</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yuuki Watanabe,ywata@yz.yamagata-u.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>83c76f9b-4deb-4017-9e78-42e911ff01ed</GUID>
        <Name>muvee Reveal version 8</Name>
        <ShortDescription>muvee Reveal lets you create and share personalized, professional looking home movies in a few quick steps. With automatic motion and face detection, your photos and video are synced to the beat of your favorite music.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/681_160x90_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/681_160x90_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>muvee Technologies Pte. Ltd.</OrganizationName>
        <OrganizationURL>http://www.muvee.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>11/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>8</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="contact@muvee.com">muvee Technologies</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.nzone.com/object/nzone_muveereveal_home.html">Application</ContentType>
           <ContentType url="http://www.nzone.com/object/nzone_muveereveal_home.html">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Digital Content Creation</ApplicationType>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mafrudy bin Rubani,mafrudy@muvee.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>376fa043-5c61-42ff-a8e9-9c5dbcee6c9e</GUID>
        <Name>Multiple Back-Propagation source code</Name>
        <ShortDescription>Multiple Back-Propagation is an open source oftware application for training neural networks with the backpropagation and the multiple back propagation algorithms. Currently this project is osted at htp://code.google.com/p/multiplebackpropagation and http://sourceforge.net/projects/mbp/</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/651_mbpTop_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/651_mbpTop_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>IPG</OrganizationName>
        <OrganizationURL>http://dit.ipg.pt/MBP/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>11</ReleaseDay>
        <ReleaseDateDisplay>12/11/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>179</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="noel@ipg.pt">Noel Lopes</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/mbp/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Noel Lopes,noel@ipg.pt</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e3fe2d97-e009-470c-9d85-6ee65c25cd43</GUID>
        <Name>ClusterTech Financial Library in GPU</Name>
        <ShortDescription>CLUSTERTECH Finance Library includes a BGM Interest Path Generator and a Trinomial Tree-based Options Pricing Model. In the BGM model, each forward rate is modeled by a lognormal process. The volatility vector function is also defined in our implementation. Then numerous interest-rate paths are generated by Monte Carlo simulation. The library also includes a trinomial recombining tree based options-pricing model , which allows for greater flexibility in the movement of rates or prices compared to the binomial counterpart.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/650_ct-fl-ad_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/650_ct-fl-ad_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Cluster Technology Limited</OrganizationName>
        <OrganizationURL>http://www.clustertech.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>11/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>30</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="hkbd@clustertech.com">Cluster Technology Limited</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.clustertech.com/products/ct-fl.jsp">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Finance</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Libraries</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Cluster Technology Limited,hkbd@clustertech.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>224bcdd0-6e4a-422e-bd1a-4c367e094441</GUID>
        <Name>ClusterTech Parallel Random Number Generator</Name>
        <ShortDescription>The ClusterTech Parallel Random Number Generator is based on Mersenne Twister which has a period of 2^19937-1. It generates multiple independent streams simultaneously across a cluster of CPUs and GPUs with a jump-ahead feature to guarantee the quality of the output.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/649_ct-prng-ad_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/649_ct-prng-ad_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Cluster Technology Limited</OrganizationName>
        <OrganizationURL>http://www.clustertech.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>11/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>30</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="hkbd@clustertech.com">Cluster Technology Limited</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.clustertech.com/products/ct-prng.jsp">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Libraries</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Cluster Technology Limited,hkbd@clustertech.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>df4a0b13-0479-40f6-8405-ae810286a06b</GUID>
        <Name>GPU computing with Kaczmarz's and otheriterative algorithms for linear systems</Name>
        <ShortDescription>The graphics processing unit (GPU) is used to solve large linear systems derived from partial differential equations. The differential equations studied are strongly convection-dominated, of various sizes, and common to many fields, including computational fluid dynamics, heat transfer, and structural mechanics. The paper presents comparisons between GPU and CPU implementations of several well-known iterative methods, including Kaczmarzs, Cimminos, component averaging, conjugate gradient normal residual (CGNR), symmetric successive overrelaxation-preconditioned conjugate gradient, and conjugate-gradientaccelerated component-averaged row projections (CARP-CG). Computations are preformed with dense as well as general banded systems. The results demonstrate that our GPU implementation outperforms CPU implementations of these algorithms, as well as previously studied parallel implementations on Linux clusters and shared memory systems. While the CGNR method had begun to fall out of favor for solving such problems, for the problems studied in this paper, the CGNR method implemented on the GPU performed better than the other methods, including a cluster implementation of the CARP-CG method.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/648_graph_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/648_graph_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Illinois Urbana-Champaign</OrganizationName>
        <OrganizationURL>http://www.uiuc.edu</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>12/22/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="elble@uiuc.edu">J. Elble</Author>
           <Author email="elble@uiuc.edu">N. Sahinidis</Author>
           <Author email="elble@uiuc.edu">P. Vouzis</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.andrew.cmu.edu/user/pvouzis/papers/Elble-Sahinidis-Vouzis_JournalOnParallelComputing_2009.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Joseph Elble,elble@uiuc.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>3b6e6752-173e-45e4-a22f-5e7eccae9b7f</GUID>
        <Name>Acceleration of a Finite-Difference WENO Scheme for Large-Scale Simulations on Many-Core Architectures</Name>
        <ShortDescription>This is a highly accelerated implementation of the finite-difference weighted essentially non-oscillatory (WENO) scheme. This method is suitable for direct numerical simulations (DNS) large eddy simulations (LES) of compressible turbulence and requires large computing resources in order to achieve high Reynolds numbers. Our implementation utilizes a multi-GPU environment.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/647_rayleigh-taylor_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/647_rayleigh-taylor_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>PDS Group - University of Patras</OrganizationName>
        <OrganizationURL>http://pdsgroup.hpclab.ceid.upatras.gr</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>12/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>50</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="karantas@ceid.upatras.gr">Konstantinos Karantasis</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://pdsgroup.hpclab.ceid.upatras.gr/pubs/aiaa-0525.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Konstantinos Karantasis,karantas@ceid.upatras.gr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>39f2db06-bc42-4ecf-9f6a-235d25b002e6</GUID>
        <Name>GPU Accelerated Pathfinding</Name>
        <ShortDescription>In the past few years the graphics programmable processor (GPU) has evolved into an increasingly convincing computational resource for non graphics applications. The GPU is especially well suited to address problem sets expressed as data parallel computation with the same program executed on many data elements concurrently. In pursuing a scalable navigation planning approach for many thousands of agents in crowded game scenes, developers became more attracted to decomposable movement algorithms that lend to explicit parallelism. Pathfinding is one key computational intelligence action in games that is typified by intense search over sparse graph data structures. This paper describes an efficient GPU implementation of parallel global pathfinding using the CUDA programming environment, and demonstrates GPU performance scale advantage in executing an inherently irregular and divergent algorithm.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/646_GPUAcceleratedPathfinding_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/646_GPUAcceleratedPathfinding_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>NVIDIA Corporation</OrganizationName>
        <OrganizationURL>http://www.nvidia.com</OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>06/20/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ableiweiss@nvidia.com">Avi Bleiweiss</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.graphicshardware.org/presentations/bleiweiss-GPU_accelerated_pathfinding.pdf">Presentation</ContentType>
           <ContentType url="http://portal.acm.org/citation.cfm?id=1413968">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Artificial Intelligence</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Avi Bleiweiss,ableiweiss@nvidia.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9b19fa0b-58e2-4937-a1bb-e6532bb7522a</GUID>
        <Name>Scalable Multi Agent Simulation on the GPU</Name>
        <ShortDescription>We present a unique and elegant graphics hardware realization of multi agent simulation. Specifically, we adapted Velocity Obstacles that suits well parallel computation on single instruction, multiple thread, SIMT, type architecture. We explore hash based nearest neighbors search to considerably optimize the algorithm when mapped on to the GPU. Moreover, to alleviate inefficiencies of agent level concurrency, primarily exposed in small agent count (&#60;32) scenarios, we exploit nested data parallel in unrolling the inner velocity iteration, demonstrating an appreciable performance gain. Simulation of ten thousand agents created with our system runs on current hardware at a real time rate of eighteen frames per second. Our software implementation builds on NVIDIAss CUDA.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/645_aicuda_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/645_aicuda_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>NVIDIA Corporation</OrganizationName>
        <OrganizationURL>http://www.nvidia.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>02</ReleaseDay>
        <ReleaseDateDisplay>11/02/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ableiweiss@nvidia.com">Avi Bleiweiss</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://developer.download.nvidia.com/compute/cuda/docs/MultiAgentGPU-RA09.pdf">Presentation</ContentType>
           <ContentType url="http://www.actapress.com/Abstract.aspx?paperId=36298 ">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Artificial Intelligence</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Avi Bleiweiss,ableiweiss@nvidia.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>25c4c0ea-21c5-449e-8a40-f8cdfa40d539</GUID>
        <Name>NVIDIA Nexus - Visual Studio-based GPU Development</Name>
        <ShortDescription>Our new GPU developer tools, code-named Nexus brings GPU Computing into Visual Studio 2008. Debug, profile, and analyze GPU code using standard workflow and tools. Nexus supports CUDA C, OpenCL, DirectCompute, Direct3D, and OpenGL. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/644_64_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/644_64_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>NVIDIA</OrganizationName>
        <OrganizationURL>http://www.nvidia.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth> 12</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay> 12/16/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="cuda@nvidia.com">NVIDIA</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://developer.nvidia.com/object/nexus.html">Application</ContentType>
           <ContentType url="http://developer.nvidia.com/object/nexus.html">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>nexus,NVIDIA,cuda@nvidia.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>4fd199bf-5cce-4533-af21-5db250922ae6</GUID>
        <Name>Recursive APSP on the GPU</Name>
        <ShortDescription>We consider the computation of shortest paths on Graphic Processing Units (GPUs). The blocked recursive elimination strategy we use is applicable to a class of algorithms (such as all-pairs shortest-paths, transitive closure, and LU decomposition without piv- oting) having similar data access patterns. Using the all-pairs shortest-paths problem as an example, we uncover potential gains over this class of algorithms. The impressive computational power and memory bandwidth of the GPU make it an attractive plat- form to run such computationally intensive algorithms. Although improvements over CPU implementations have previously been achieved for those algorithms in terms of raw speed, the utilization of the underlying computational resources was quite low. We implemented a recursively partioned all-pairs shortest-paths algorithm that harnesses the power of GPUs better than existing implementations. The alternate schedule of path computations allowed us to cast almost all operations into matrix-matrix multi- plications on a semiring. Since matrix-matrix multiplication is highly optimized and has a high ratio of computation to communication, our implementation does not suer from the premature saturation of bandwidth resources as iterative algorithms do. By increasing temporal locality, our implementation runs more than two orders of magni- tude faster on an NVIDIA 8800 GPU than on an Opteron. Our work provides evidence that programmers should rethink algorithms instead of directly porting them to GPU.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/643_apsp-timings-small_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/643_apsp-timings-small_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>UC Santa Barbara</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>11/30/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>480</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="aydin@cs.ucsb.edu">Aydin Buluc</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://gauss.cs.ucsb.edu/~aydin/apsp_cuda.html">Paper</ContentType>
           <ContentType url="http://gauss.cs.ucsb.edu/~aydin/apsp_cuda.html">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Aydin Buluc,aydin@cs.ucsb.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>b474b1ae-94b1-4ac4-ba86-a16506460ba4</GUID>
        <Name>Multiphase flow in porous media</Name>
        <ShortDescription>The movie shows fractional flow of oil and water in a generic porous medium (glass beads, water wet). The glass beads are visualized by a transparent material, the water is invisible and the oil phase is shown by a color encoded surface. The color represents the pressure distribution, where red is high and blue low pressure. The porous medium is resolved by 250^3 grid points. Ingrain's digital rock physics lab computes the physical properties and fluid flow characteristics of oil and gas reservoir rocks. Our technology leads the industry in measuring shales, carbonates, tight gas sands and oil sands. Ingrain uses advanced lattice Boltzmann methods to simulate multiphase flow in the rocks (porous media). The simulation engine uses a sparse data structure to represent the grid. The simulations are accelerated by using GPUs and the CUDA technology by two orders of magnitude compared to a state of the art multicore desktop computer. On a single Tesla GPU with 4GB memory we are able to simulate grids up to 800^3/600^3 for 5 % porosity and up to 500^3/400^3 for 40 % porosity for single/multi phase flow. For larger grids multiple GPUs are used in parallel.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/642_movBlackLogo.0400_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/642_movBlackLogo.0400_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Ingrain</OrganizationName>
        <OrganizationURL>http://www.ingrainrocks.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>12/05/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>100</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="toelke@ingrainrocks.com">Jonas Toelke</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.youtube.com/watch?v=cNDUKylb4Ds">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jonas Toelke,toelke@ingrainrocks.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c0d931f3-fe2d-42cf-aa7c-981392258c99</GUID>
        <Name>FastFractal256</Name>
        <ShortDescription>Mandelbrot fractal render. Uses software integers at 256 bit precision run on GPU.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/641_baby-mandelbrot_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/641_baby-mandelbrot_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Imaginary Software, LLC</OrganizationName>
        <OrganizationURL>http://www.fastfractal.com/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>11/16/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>10</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="contact@fastfractal.com">Imaginary Software, LLC</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.fastfractal.com/">Application</ContentType>
           <ContentType url="http://www.youtube.com/watch?v=NuutXOIeX5o">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Imaginary Software, LLC,contact@fastfractal.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>75775576-7f1b-4338-950d-57508d71eb11</GUID>
        <Name>Digital Breast Tomosynthesis Reconstruction</Name>
        <ShortDescription>reconstruction of Digital Breast Tomosynthesis volumes. The CUDA version gave a minimum 25x speedup over multi-threaded implementation on an Intel Core i7 quad-core CPU. The application is also scalable to multiple GPUs for further acceleration. 
This work was done courtesy of Massachusetts General Hospital with additional support from the Bernard M. Gordon Center for Subsurface Sensing and Imaging Systems (Gordon-CenSSIS). Individual and Institutional Contributors include: Professor David Kaeli, Daniel B. Kopans M.D., Micha Moffie PhD., Richard H Moore, Diego Rivera, Dana Schaa, Juemin Zhang PhD., Brandeis University, and Dexela, Ltd.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/640_tomo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/640_tomo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Massachusetts General Hospital</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>11/03/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>85</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="bcbrown@partners.org">Benjamin C Brown</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ece.neu.edu/~dschaa/files/tomo.avi">Multimedia</ContentType>
           <ContentType url="http://www.ece.neu.edu/~dschaa/files/tomo_isbi.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Life Sciences</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>GTX285 vs. Intel Core i7 940 quad-core 2.93 GHz.,Benjamin C Brown,bcbrown@partners.org</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5a4a9940-b346-454f-926b-fc21e5e9995b</GUID>
        <Name>Needleman-Wunsch Sequence Alignment</Name>
        <ShortDescription>The Needleman-Wunsch Sequence Alignment using CUDA</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/639_nw_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/639_nw_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Virginia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>03</ReleaseDay>
        <ReleaseDateDisplay>12/03/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>8</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="sc5nf@virginia.edu">Shuai Che</Author>
           <Author email="">Kevin Skadron</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Downloads">Application</ContentType>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Downloads">Multimedia</ContentType>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Downloads">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Life Sciences</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Shuai Che,Kevin Skadron,sc5nf@virginia.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>984a48ef-ecb4-4649-85e3-fb42ceff5269</GUID>
        <Name>CUDAEASY</Name>
        <ShortDescription>We present a graphics processing unit (GPU) accelerated program that solves the evolution of interacting scalar fields in an expanding universe in NVIDIA's Compute Unified Device Architecture (CUDA). In chaotic inflation models we report speedups between one and two orders of magnitude depending on the used hardware and software while achieving small errors in single precision. Simulations that used to last roughly one day to compute can now be done in hours and this difference is expected to increase in the future. The program has been written in the spirit of LATTICEEASY and users of the aforementioned program should find it relatively easy to start using CUDAEASY in lattice simulations.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/638_logo_big_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/638_logo_big_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Turku / Department of Physics and Astronomy</OrganizationName>
        <OrganizationURL>http://www.physics.utu.fi/en/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>12/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>100</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="jani.sainio@utu.fi">Jani Sainio</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.physics.utu.fi/theory/particlecosmology/cudaeasy/">Application</ContentType>
           <ContentType url="http://vanha.physics.utu.fi/theory/particlecosmology/cudaeasy/media.html">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jani Sainio,jani.sainio@utu.fi</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e563fe70-8e81-434a-81a1-4d1ca78c77a4</GUID>
        <Name>TeraChem</Name>
        <ShortDescription>General purpose software for quantum chemistry calculations designed specifically for Nvidia GPU</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/637_CoverArtDNANew_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/637_CoverArtDNANew_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>PetaChem, LLC</OrganizationName>
        <OrganizationURL>http://www.petachem.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>24</ReleaseDay>
        <ReleaseDateDisplay>11/24/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>650</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="i.ufimtsev@gmail.com">Ivan Ufimtsev</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.petachem.com/betaversion.html">Application</ContentType>
           <ContentType url="http://petachem.com/demo.html">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Life Sciences</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ivan Ufimtsev,i.ufimtsev@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>51876181-0577-4305-8961-455fe9f22ce9</GUID>
        <Name>Monte Carlo eXtreme (MCX)</Name>
        <ShortDescription>Monte Carlo eXtreme, or MCX, is a Monte Carlo simulation software for photon migration in 3D turbid media. It uses Graphics Processing Units (GPU) based massively parallel computing techniques and is extremely fast compared to traditional CPU-based simulations. Using an nVidia 8800GT graphics card (14MP/114Cores), the acceleration is about 300x~400x with over 1700 parallel threads; this ratio can be as high as 700x on a high-end GTX 295 GPU (multiply by another 2x if both GPUs on GTX295 are used).</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/636_mcx_logo_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/636_mcx_logo_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Massachusetts General Hospital, Harvard Medical School</OrganizationName>
        <OrganizationURL>http://nmr.mgh.harvard.edu/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>22</ReleaseDay>
        <ReleaseDateDisplay>10/22/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>300</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="fangqq@gmail.com">Qianqian Fang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://mcx.sourceforge.net/">Application</ContentType>
           <ContentType url="http://www.opticsinfobase.org/oe/abstract.cfm?uri=oe-17-22-20178">Paper</ContentType>
           <ContentType url="https://orbit.nmr.mgh.harvard.edu/plugins/scmsvn/viewcvs.php/mcextreme_cuda/trunk/?root=mcextreme">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>3D Photon Migration,Qianqian Fang,fangqq@gmail.com</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>ff3b0870-5be3-48ff-b14a-1e3b54c3320f</GUID>
        <Name>AIRWC</Name>
        <ShortDescription>Accelerated Image Registration with CUDA. Fast medical image registrion using affina and B-Spline transformations.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/634_image002_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/634_image002_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Cambridge, Dept of Physics</OrganizationName>
        <OrganizationURL>http://www.phy.cam.ac.uk/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>11/15/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>100</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="rea1@cam.ac.uk">Richard Ansorge</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.bss.phy.cam.ac.uk/~rea1/AIRWC.html">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Richard Ansorge,rea1@cam.ac.uk</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>a5d1af40-470d-4088-b087-30a5e7a408d3</GUID>
        <Name>Task and Data Parallel Framework for GPU Computing</Name>
        <ShortDescription>MIT Lincoln Laboratory is developing PVTOL, a high-performance, portable signal and image processing library The goals of PVTOL are to: Provide a portable framework for high-performance embedded computing Support data and task parallelism Reduce the complexity and increase the speed of developing applications </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/628_pvtol_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/628_pvtol_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>MIT Lincoln Laboratory</OrganizationName>
        <OrganizationURL>http://ww.tll.mit.edu</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>12</ReleaseDay>
        <ReleaseDateDisplay>11/12/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="brock.j@neu.edu">James Brock</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.mit.edu/~kepner/PVTOL/">Multimedia</ContentType>
           <ContentType url="http://www.mit.edu/~kepner/PVTOL/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>James Brock,brock.j@neu.edu</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>c47ca00a-cfa4-4bfc-9e05-9aa325fcf26c</GUID>
        <Name>TMPGEnc KARMA..Plus</Name>
        <ShortDescription>TMPGEnc KARMA..Plus makes it easy to take control of your ever-growing digital video library. Sort, search, classify, play, and even compare your digital video with easy-to-use tools and controls. And it supports NVIDIA CUDA technology for filter processing, decoding and H.264/AVC file output. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/625_tmkp_main_quickview_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/625_tmkp_main_quickview_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>Pegasys Inc.</OrganizationName>
        <OrganizationURL>http://www.pegasys-inc.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>11/10/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>9</SpeedUp>
        <SoftwareLicenseType>Commercial</SoftwareLicenseType>
        <Authors>
           <Author email="saito@pegasys-inc.com">Zakk saito</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://tmpgenc.pegasys-inc.com/en/product/tmkp.html">Application</ContentType>
           <ContentType url="http://www.youtube.com/watch?v=PAufsaXqLLs&amp;feature=player_embedded">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>CUDA H.264 Deocde Player Manage TMPG TMPGEnc Pegasys,Zakk saito,saito@pegasys-inc.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f1d52d6a-a875-4d32-90ba-c1c23aa4f6a0</GUID>
        <Name>Mersenne Twister for Graphic Processors (MTGP)</Name>
        <ShortDescription>MTGP is a new variant of Mersenne Twister (MT) introduced by Mutsuo Saito and Makoto Matsumoto in 2009. MTGP is designed with some features of Graphic Processors, such as parallel execution and hi-speed constant reference. It supports 32-bit and 64-bit integers, as well as single and double precision floating point as output. The periods of generated sequence are 11213-1,223209-1 and 244497-1 for 32-bit version, and 223209-1, 244497-1, 2110503-1 for 64-bit version. It support 128 parameter sets for each period, in other words, it can generate 128 independent pseudorandom number sequences for each period. We are now developing Dynamic Creator for MTGP, which generates more parameter sets. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/624_mtgp_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/624_mtgp_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Department of Mathematics, Hiroshima University , Japan</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>11/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="saito@math.sci.hiroshima-u.ac.jp">Mutsuo Saito</Author>
           <Author email="">Makoto Matsumoto</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MTGP/index.html">Paper</ContentType>
           <ContentType url="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MTGP/index.html">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Finance</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Libraries</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Mutsuo Saito,Makoto Matsumoto,saito@math.sci.hiroshima-u.ac.jp</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5a730964-d49a-4305-b5a8-3c5d75ecf73b</GUID>
        <Name>Eudyptula</Name>
        <ShortDescription>Eudyptula is portable graphics engine that provides advanced support for the CUDA tools of NVIDIA and with its core purpose to be used in the development of scientific applications </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/622_eudyptula_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/622_eudyptula_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>OpenSource</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>06/25/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Georgios Paraskevas</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/eudyptula/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Georgios Paraskevas</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>9ca281be-34d8-4b10-9f7c-cd1853ad715c</GUID>
        <Name>High performance sequence alignment</Name>
        <ShortDescription>A fast Smith-Waterman algorithm, implemented on CUDA </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/620_protein_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/620_protein_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>OpenSource</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>09/19/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Vahid Noormofidi</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/cudaalignment/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Life Sciences</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Vahid Noormofidi</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>082d85de-353e-4a4d-9613-2513309d4b09</GUID>
        <Name>aeth.drive</Name>
        <ShortDescription>A fast, parallel, versatile QED modelling framework. Uses Geometric Calculus and CUDA. Algorithm supports complex phenomena including turbulence, quantum effects, and relativistic gravitational procession. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/619_aeth_small.jpeg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/619_aeth_large.jpeg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>OpenSource</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>15</ReleaseDay>
        <ReleaseDateDisplay>11/15/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Kevin Daley</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/aethdrive/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Kevin Daley</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>2d481ec6-3138-4970-9d92-0abf7e82d639</GUID>
        <Name>BlazeSim</Name>
        <ShortDescription>Project SHERIF is the hardware acceleration of the Fire Dynamics Simulator (FDS) using CUDA on NVIDIA graphic cards.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/618_blazesim_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/618_blazesim_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>OpenSource</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>21</ReleaseDay>
        <ReleaseDateDisplay>05/21/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">mastermemorex</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/blazesim/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>mastermemorex</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>91df274b-6c8d-470a-956d-8e6ff1d8c053</GUID>
        <Name>jacuzzi</Name>
        <ShortDescription>This projects aims at providing java-bindings to the CUDA numeric environment. CUDA is an extension to the C/C++ programming language by NVIDIA. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/617_jacuzzi_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/617_jacuzzi_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>OpenSource</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>03/05/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Alexander Heusel</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/jacuzzi/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Alexander Heusel</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>551bb282-5e25-4ff5-92fc-a0fc675d32bc</GUID>
        <Name>cuda cagen</Name>
        <ShortDescription>CUDA-based rule 30 cellular automaton generator for nVidia GPUs</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/616_CellularAutomata_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/616_CellularAutomata_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>OpenSource</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>09/17/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Yuri Parfenov</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/cudacagen/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yuri Parfenov</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>60d005b8-e3c7-47a5-8fec-ab8aef9f2031</GUID>
        <Name>Fast parallel Particle-To-Grid interpolation for plasma PIC simulations on the GPU</Name>
        <ShortDescription>Particle-in-Cell (PIC) methods have been widely used for plasma physics simulations in the past three decades. To ensure an acceptable level of statistical accuracy relatively large numbers of particles are needed. State-of-the-art Graphics Processing Units (GPUs), with their high memory bandwidth, hundreds of SPMD processors, and half-a-teraflop performance potential, offer a viable alternative to distributed memory parallel computers for running medium-scale PIC plasma simulations on inexpensive commodity hardware. In this paper, we present an overview of a typical plasma PIC code and discuss its GPU implementation. In particular we focus on fast algorithms for the performance bottleneck operation of particle-to-grid interpolation.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/615_ptg_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/615_ptg_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Maryland</OrganizationName>
        <OrganizationURL>http://www.umd.edu/</OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>10/01/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>20</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="gogo@umd.edu ">George Stantchev</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.cscamm.umd.edu/publications/yjpdc2543-4_CS-08-35.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>George Stantchev,gogo@umd.edu </Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>276b1bef-214e-4528-85e7-c08792f09988</GUID>
        <Name>cudacluster</Name>
        <ShortDescription>The CUDA Cluster allows you to organize a cluster of CUDA-enabled Peer-To-Peer nodes, allowing for execution of tasks with extreme performance, by harnessing the combined power of multiple such GPU hosts. Sample jobs are provided. C#.Net/Mono with C. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/614_cudacluster_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/614_cudacluster_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>OpenSource</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>08</ReleaseMonth>
        <ReleaseDay>06</ReleaseDay>
        <ReleaseDateDisplay>08/06/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Nikolaos Tountas</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/cudacluster/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Nikolaos Tountas</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>2be843df-918d-4f4f-94ec-6c1b99e58760</GUID>
        <Name>MP3 Encoder</Name>
        <ShortDescription>MP3 encoder that runs on CUDA compatible hardware. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/613_cudamp3_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/613_cudamp3_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType></OrganizationType>
        <OrganizationName>OpenSource</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>03</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>03/19/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Research</SoftwareLicenseType>
        <Authors>
           <Author email="">biggestpos </Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/cudamp3encoder/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Video &amp; Audio</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>biggestpos </Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9a4aea49-e96f-487a-b6e3-ab50c134a049</GUID>
        <Name>cesql</Name>
        <ShortDescription>Database Server based on NVIDIA CUDA Technology. CUDA makes it possible to use the GPU and its performance for parallel data computing.A classic sql server uses only about 15 GFlops instead of more than 500 GFlops which could be used by cesql. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/612_cesql_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/612_cesql_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>OpenSource</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>06/08/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="Arash_Mahini@users.sourceforge.net">Arash Mahini</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/cesql/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Arash Mahini,Arash_Mahini@users.sourceforge.net</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>436a1f19-e066-438d-9769-afd6b612b52e</GUID>
        <Name>cehttp</Name>
        <ShortDescription>Web Server based on NVIDIA CUDA Technology. CUDA makes it possible to use the GPU and its performance for parallel data computing.A classic web server uses only about 15 GFlops instead of more than 500 GFlops which could be used by cehttp. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/611_cehttp_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/611_cehttp_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>OpenSource</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>06</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>06/08/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="Arash_Mahini@users.sourceforge.net">Arash Mahini</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/cehttp/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType></ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Arash Mahini,Arash_Mahini@users.sourceforge.net</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>faba717d-f830-457b-94a4-a8ca1d709890</GUID>
        <Name>The CUDA Files</Name>
        <ShortDescription>Implementations of various algorithms using CUDA. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/610_thecudafiles_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/610_thecudafiles_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>OpenSource</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>08</ReleaseDay>
        <ReleaseDateDisplay>01/08/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="sashang@users.sourceforge.net">sashang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/thecudafiles/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>sashang,sashang@users.sourceforge.net</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>8ffabfbd-cad9-4fa1-81ee-f61d4bc4cc76</GUID>
        <Name>FreeSWITCH-CUDA</Name>
        <ShortDescription>This goal of this project is produce and maintain a branch of the FreeSWITCH telephony platform that utilizes CUDA (NVida's GPGPU toolkit) to offload cpu-intensive transcoding tasks to the (NVidia) GPU. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/609_freeswitch_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/609_freeswitch_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>OpenSource</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>04/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="Zac_Wolfe@users.sourceforge.net">Zac Wolfe</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/freeswitch-cuda/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Zac Wolfe,Zac_Wolfe@users.sourceforge.net</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>9b5c77ca-f014-4173-83cd-3bc3da09039b</GUID>
        <Name>tokaspt</Name>
        <ShortDescription>The Once Known as SmallPT is a cheap editable realtime derivation of http://kevinbeason.com/smallpt/ By way of the marketing department, some outrageously insignificant numbers: on a Quadro FX 5800, on the default scene at default resolution and configuration, 768x512x(2x2)x118fps = 185.6M 4-bounces rays are traced per second (alternatively, a maximum of 742.4M bounces are generated). Requires CUDA 2.1 to compile and run. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/608_img_ui_bloated_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/608_img_ui_bloated_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>http://ompf.org</OrganizationName>
        <OrganizationURL>http://ompf.org</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>01</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>01/25/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="tbptbp@gmail.com">Thierry Berger-Perrin</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/tokaspt/">Application</ContentType>
           <ContentType url="http://code.google.com/p/tokaspt/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Thierry Berger-Perrin,tbptbp@gmail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>e143112b-a0c0-4f45-8360-6afe7687f68e</GUID>
        <Name>A framework for efficient and scalable execution of domain-specific templates on GPUs</Name>
        <ShortDescription>Graphics Processing Units (GPUs) have emerged as important players in the transition of the computing industry from sequential to multi- and many-core computing. We propose a software framework for execution of domain specific parallel templates on GPUs, which simultaneously raises the abstraction level of GPU programming and ensures efficient execution with forward scalability to large data sizes and new GPU platforms. To achieve scalable and efficient GPU execution, our framework focuses on two critical problems that have been largely ignored in previous efforts - processing large data sets that do not fit within the GPU memory, and minimizing data transfers between the host and GPU. Our framework takes domain-specific parallel programming templates that are expressed as parallel operator graphs, and performs operator splitting, offload unit identification, and scheduling of off-loaded computations and data transfers between the host and the GPU, to generate a highly optimized execution plan. Finally, a code generator produces a hybrid CPU/GPU program in accordance with the derived execution plan, that uses lower level frameworks such as CUDA. We have applied the proposed framework to templates from the recognition domain, specifically edge detection kernels and convolutional neural networks that are commonly used in image and video analysis. We present results on two different GPU platforms from NVIDIA (a Tesla C870 GPU computing card and a GeForce 8800 graphics card) that demonstrate 1.7 - 7.8X performance improvements over already accelerated baseline GPU implementations. We also demonstrate scalability to input data sets and application memory footprints of 6GB and 17GB, respectively, on GPU platforms with only 768MB and 1.5GB of memory.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/607_ipdp_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/607_ipdp_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>NEC Labs, Berkeley, Purdue</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>01</ReleaseDay>
        <ReleaseDateDisplay>05/01/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>8</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="narayans@eecs.berkeley.edu">Narayanan Sundaram</Author>
           <Author email="">Anand Raghunathan</Author>
           <Author email="">Srimat T. Chakradhar</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.eecs.berkeley.edu/~narayans/Publications_files/ipdps2009.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Medical Imaging</ApplicationType>
           <ApplicationType>machine learning</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>edge detection, convolution neural network, out-of-core,Narayanan Sundaramyz,Anand Raghunathanyx,Srimat T. Chakradhar,narayans@eecs.berkeley.edu</Keyword>
        </Keywords>
     </Application>

     <Application>
        <GUID>d45f95f7-772b-41f6-a00d-4cb40e53e785</GUID>
        <Name>HyperNEAT4CUDA</Name>
        <ShortDescription>This is a simple C# implementation of HyperNEAT implemented on NVidia's Compute Unified Device Architecture (CUDA). </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/605_hyperneat_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/605_hyperneat_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>OpenSource</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>19</ReleaseDay>
        <ReleaseDateDisplay>05/19/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">K A Lloyd</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://sourceforge.net/projects/hyperneat4cuda/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>K A Lloyd</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>727f6e8e-1cc9-4afc-9d6f-3329a569a712</GUID>
        <Name>Smoke rendering demo</Name>
        <ShortDescription>This application renders a density field of float values. In the particualr demo it is a smoke density field, but i could might as well be other sorts of data like fog, fluids or calculations. The density field is visualized using a ray marching technique and the background is rendered by ray tracing a kd tree.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/604_smoke_sreenshot1_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/604_smoke_sreenshot1_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Alexandra Instituttet</OrganizationName>
        <OrganizationURL>http://www.alexandra.dk/index.htm</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>14</ReleaseDay>
        <ReleaseDateDisplay>05/14/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="peter.trier@alexandra.dk">Peter Trier</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://cg.alexandra.dk/category/software/">Application</ContentType>
           <ContentType url="http://www.youtube.com/watch?v=teEDA9esk-A">Multimedia</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Smoke rendering, ray tracing,Peter Trier,peter.trier@alexandra.dk</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>72067ded-99f3-4176-96ad-9f1551b12c41</GUID>
        <Name>CUJ2K - JPEG2000 Encoder </Name>
        <ShortDescription>CUJ2K is a fast encoder for the new image compression standard JPEG2000 which is an improvement of JPEG providing better compression ratios and also supporting lossless compression along with many other features. JPEG2000 is very computation-intensive and therefore benfits much from CUDA acceleration. CUJ2K uses streaming to accelerate batch image compression. This program provides commandline-, .Net GUI- and libary-interfaces to convert BMP -> JPEG2000. It also supports creation of MJ2 videos.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/603_banner_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/603_banner_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Hochschule</OrganizationType>
        <OrganizationName>University of Stuttgart, IPVS</OrganizationName>
        <OrganizationURL>http://www.ipvs.uni-stuttgart.de/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>20</ReleaseDay>
        <ReleaseDateDisplay>09/20/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>4</SpeedUp>
        <SoftwareLicenseType>Open Source</SoftwareLicenseType>
        <Authors>
           <Author email="cuj2k.project@googlemail.com">Norbert Fuerst</Author>
           <Author email="">Armin Weiss</Author>
           <Author email="">Simon Papandreou</Author>
           <Author email="">Martin Heide</Author>
           <Author email="">Ana Balevic</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://cuj2k.sourceforge.net/">Application</ContentType>
           <ContentType url="http://cuj2k.sourceforge.net/">Paper</ContentType>
           <ContentType url="http://cuj2k.sourceforge.net/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Graphics</ApplicationType>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Medical Imaging</ApplicationType>
           <ApplicationType>Libraries</ApplicationType>
           <ApplicationType>Video &amp; Audio</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>JPEG2000, image compression, encoder, codec, JPEG, CUJ2K, image processing, lossless, lossy,Norbert Fuerst,Armin Weiss,Simon Papandreou, Martin Heide, Ana Balevic,cuj2k.project@googlemail.com</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>64528049-540a-4d7f-9cc0-2d4a2ccad4f0</GUID>
        <Name>Parallel Multiclass classification using SVM on GPUs</Name>
        <ShortDescription>The scaling of serial algorithms cannot rely on the improvement of CPUs anymore. The performance of classical Support Vector Machine (SVM) implementations has reached its limit and the arrival of the multi core era requires these algorithms to adapt to a new parallel scenario. Graphics Processing Units (GPU) have arisen as high performance platforms to implement data parallel algorithms. In this paper, it is described how a native implementation of a multiclass classifier based on SVMs can map its inherent degrees of parallelism to the GPU programming model and efficiently use its computational throughput. Empirical results show that the training and classification time of the algorithm can be reduced an order of magnitude compared to a classical solver, LIBSVM, while guaranteeing the same accuracy.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/602_multisvm_small.gif</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/602_multisvm_large.gif</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>MIT</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>12/31/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>112</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="sherrero@mit.edu">Sergio Herrero-Lopez</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/multisvm/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Sergio Herrero-Lopez,sherrero@mit.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>f7874e4b-ba49-44f9-b736-6a3341519f41</GUID>
        <Name>Fast pattern classification of ventricular arrhythmias using graphics processing units</Name>
        <ShortDescription>Graphics Processing Units (GPUs) can provide remarkable performance gains when compared to CPUs for computationally-intensive applications. In the biomedical area, most of the previous studies are focused on using Neural Networks (NNs) for pattern recognition of biomedical signals. However, the long training times prevent them to be used in real-time. This is critical for the fast detection of Ventricular Arrhythmias (VAs) which may cause cardiac arrest and sudden death. In this paper, we present a parallel implementation of the Back-Propagation (BP) and the Multiple Back-Propagation (MBP) algorithm which allowed significant training speedups. In our proposal, we explicitly specify data parallel computations by defining special functions (kernels); therefore, we can use a fast evaluation strategy for reducing the computational cost without wasting memory resources. The performance of the pattern classification implementation is compared against other reported algorithms.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/600_mbpTop_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/600_mbpTop_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>IPG</OrganizationName>
        <OrganizationURL>http://www.ipg.pt</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>09</ReleaseDay>
        <ReleaseDateDisplay>11/09/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>53</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="noel@ipg.pt">Noel Lopes</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://dit.ipg.pt/MBP/papers.aspx">Application</ContentType>
           <ContentType url="http://dit.ipg.pt/MBP/papers.aspx">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>medicine</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Neural Networks,Noel Lopes,noel@ipg.pt</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c8a33001-387c-474f-a477-63571429ab6f</GUID>
        <Name>Heart Wall Tracking</Name>
        <ShortDescription>Tracking of mouse heart walls through a series of ultrasound images.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/599_heartwall_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/599_heartwall_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Virginia</OrganizationName>
        <OrganizationURL>http://www.virginia.edu</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>11</ReleaseMonth>
        <ReleaseDay>05</ReleaseDay>
        <ReleaseDateDisplay>11/05/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>15</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="lgs9a@virginia.edu">Lukasz G. Szafaryn</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Heart_Wall_Tracking">Application</ContentType>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Heart_Wall_Tracking">Multimedia</ContentType>
           <ContentType url="https://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Heart_Wall_Tracking">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Medical Imaging</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Image Processing, Feature Detection, Ultrasound,Lukasz G. Szafaryn,lgs9a@virginia.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>d29736de-ffee-4b0a-b7ec-8d041259c195</GUID>
        <Name>Towards a multi-GPU solver for the three-dimensional two-phase incompressible Navier-Stokes equations</Name>
        <ShortDescription>We have ported parts of our parallel level-set based two-phase solver for the three-dimensional Navier-Stokes equations on the GPU. To our knowledge, this is the first time that a two-phase fluid solver profits from the performance boost of several GPUs. A multi-GPU double-precision solver for the pressure Poisson equation based on the Jacobi preconditioned conjugate gradient method was implemented using CUDA and MPI. Thereby, we obtain a major speedup factor of 31.1 for the Poisson solver on four GPUs of our NVIDIA Tesla S1070, in contrast to a single CPU. Consequently, our overall fluid solver shows an impressive speedup factor of 16.6.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/598_logo_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/598_logo_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Institute for Numerical Simulation - University of Bonn, Germany</OrganizationName>
        <OrganizationURL>http://www.ins.uni-bonn.de</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>30</ReleaseDay>
        <ReleaseDateDisplay>09/30/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>16</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="zaspel@ins.uni-bonn.de">Peter Zaspel</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://wissrech.ins.uni-bonn.de/people/zaspel/poster_GPU2009.pdf">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>CFD, multi-GPU, Navier-Stokes, multi-phase,Peter Zaspel,zaspel@ins.uni-bonn.de</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>5bd7b280-5a27-49e5-be83-c95099ac3a3c</GUID>
        <Name>String Matching on a Multicore GPU Using CUDA</Name>
        <ShortDescription>Graphics Processing Units (GPUs) have evolved over the past few years from dedicated graphics rendering devices to powerful parallel processors, outperforming traditional Central Processing Units (CPUs) in many areas of scientific computing. The use of GPUs as processing elements was very limited until recently, when the concept of General-Purpose computing on Graphics Processing Units (GPGPU) was introduced. GPGPU made possible to exploit the processing power and the memory bandwidth of the GPUs with the use of APIs that hide the GPU hardware from programmers. This paper presents experimental results on the parallel processing for some well known on-line string matching algorithms using one such GPU abstraction API, the Compute Unified Device Architecture (CUDA).</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/597_cuda1o_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/597_cuda1o_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Macedonia</OrganizationName>
        <OrganizationURL>http://www.uom.gr</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>10</ReleaseDay>
        <ReleaseDateDisplay>09/10/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>24</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ckouz@uom.gr">C. S. Kouzinopoulos</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.computer.org/portal/web/csdl/doi/10.1109/PCI.2009.47">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>String matching</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>string matching, algorithms, CUDA, GPGPU, parallel,C. S. Kouzinopoulos,ckouz@uom.gr</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>c3242f2b-7ede-43d1-87b7-c462eae24c94</GUID>
        <Name>Fast Tridiagonal Solvers on the GPU</Name>
        <ShortDescription>We study the performance of three parallel algorithms and their hybrid variants for solving tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (PCR) and recursive doubling (RD). We develop an approach to measure, analyze, and optimize the performance of GPU programs in terms of memory access, computation, and control overhead. We find that CR enjoys linear algorithm complexity but suffers from more algorithmic steps and bank conflicts, while PCR and RD have fewer algorithmic steps but do more work each step. To combine the benefits of the basic algorithms, we propose hybrid CR+PCR and CR+RD algorithms, which improve the performance of PCR, RD and CR by 21%, 31% and 61% respectively. Our GPU solvers achieve up to a 28x speedup over a sequential LAPACK solver, and a 12x speedup over a multi-threaded CPU solver.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/596_idav_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/596_idav_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of California, Davis</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>28</ReleaseDay>
        <ReleaseDateDisplay>10/28/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>12</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="yaozhang@ucdavis.edu">Yao Zhang</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://graphics.cs.ucdavis.edu/publications/print_pub?pub_id=978">Application</ContentType>
           <ContentType url="http://graphics.cs.ucdavis.edu/publications/print_pub?pub_id=978">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Numerics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Yao Zhang,yaozhang@ucdavis.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>0facea85-946d-47ef-93fd-12b5ae74b4b6</GUID>
        <Name>Accelerating Geo-Science and Engineering System Simulations on Graphics Hardware</Name>
        <ShortDescription>This paper discusses GPU implementations of three example applications from computational fluid dynamics, seismic wave propagation, and rock magnetism. These candidate applications involve important numerical modeling techniques, widely employed in physical system simulations, that are themselves examples of distinct computing classes identified as fundamental to scientific and engineering computing. The presented numerical methods (and respective computing classes they belong to) are: (1) a lattice-Boltzmann code for geofluid dynamics (structured grid class); (2) a spectral-finite-element code for seismic wave propagation simulations (sparse linear algebra class); and (3) a least-squares minimization code for interpreting magnetic force microscopy data (dense linear algebra class). Significant performance increases are seen in all three applications, demonstrating the power of GPU implementations for these types of simulations and their associated computing classes.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/595_stochastic_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/595_stochastic_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>University of Minnesota</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>10/25/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>30</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="sdcwalsh@umn.edu">Stuart D.C. Walsh</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://dx.doi.org/10.1016/j.cageo.2009.05.001">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Science</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Stuart D.C. Walsh,sdcwalsh@umn.edu</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>786fec9c-472d-4f0e-9985-42ad2050e358</GUID>
        <Name>Sailfish: An Open Source fluid simulation package using the Lattice-Boltzmann method</Name>
        <ShortDescription>Sailfish is a general purpose fluid dynamics solver optimized for modern multicore processors, especially Graphics Processing Units (GPUs). The solver is based on the Lattice Boltzmann Method and works for both 2D and 3D fluids. Its performance peaks at 950MLUPS with the D2Q9 grid and 750MLUPS with D3Q19 (using CUDA on a single GTX280 video card). The design of Sailfish tries to reconcile ease of use and flexibility with performance. Python, with its powerful modules: sympy (for automatic code generation), numpy, pygame, tvtk etc. is used as the main language on the host (for I/O, visualization and user interaction), while the actual computations are performed on the GPU using CUDA or OpenCL.
</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/594_sailfish_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/594_sailfish_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Institute of Physics, University of Silesia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>04</ReleaseMonth>
        <ReleaseDay>17</ReleaseDay>
        <ReleaseDateDisplay>04/17/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>100</SpeedUp>
        <SoftwareLicenseType>Open Source</SoftwareLicenseType>
        <Authors>
           <Author email="mjanusz@us.edu.pl">M. Januszewski</Author>
           <Author email="">M. Kostur</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.youtube.com/watch?v=kx4-VjaJ2eI">Multimedia</ContentType>
           <ContentType url="http://gitorious.org/sailfish">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>M. Januszewski,M. Kostur,mjanusz@us.edu.pl</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>111d3757-3e16-4600-bf47-437a832bae86</GUID>
        <Name>GPU-SPHysics</Name>
        <ShortDescription>a GPU-based Smoothed Particle Hydrodynamics model for free surface flows</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/593_boreinboxwhite_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/593_boreinboxwhite_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Istituto Nazionale di Geofisica e Vulcanologia</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2008</ReleaseYear>
        <ReleaseMonth>12</ReleaseMonth>
        <ReleaseDay>31</ReleaseDay>
        <ReleaseDateDisplay>12/31/2008</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>23</SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="">Alexis Herault</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.ce.jhu.edu/dalrymple/GPU/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Computational Fluid Dynamics</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword></Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>dea0e214-213a-4557-9ef4-1e9d5d6f80c9</GUID>
        <Name>Evaluating Multi-Core Platforms for HPC Data-Intensive Kernels</Name>
        <ShortDescription>We present an evaluation of three platform types, namely NVIDIA GPUs, the STI Cell/B.E., and generic multi-core CPUs on convolutional resampling (aka gridding), which is an irregular, data-intensive application from radio astronomy. We evaluate these platforms in terms of performance, programming effort and cost. Although we do not select a clear winner, we do provide a list of guidelines to assist in platform choice and development of similar data-intensive applications.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/592_gridding_fig_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/592_gridding_fig_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Delft University of Technology</OrganizationName>
        <OrganizationURL>http://www.tudelft.nl/</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>05</ReleaseMonth>
        <ReleaseDay>18</ReleaseDay>
        <ReleaseDateDisplay>05/18/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="a.s.vanamesfoort@tudelft.nl">Alexander S. van Amesfoort</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.pds.ewi.tudelft.nl/~afoort/publ/cf09/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Imaging</ApplicationType>
           <ApplicationType>Numerics</ApplicationType>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>data-intensive gridding astronomy,Alexander S. van Amesfoort,a.s.vanamesfoort@tudelft.nl</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>02285ada-66ce-4cd5-8809-e459372d9fb8</GUID>
        <Name>An efficient GPU implementation for large scaleindividual-based simulation of collective behavior</Name>
        <ShortDescription>In this work we describe a GPU implementation for an individual-based model for fish schooling. In this model each fish aligns its position and orientation with an appropriate average of its neighbors positions and orientations. This carries a very high computational cost in the so-called nearest neighbors search. By leveraging the GPU processing power and the new programming model called CUDA we implement an efficient framework which permits to simulate the collective motion of high-density individual groups. In particular we present as a case study a simulation of motion of millions of fishes. We describe our implementation and present extensive experiments which demonstrate the effectiveness of our GPU implementation.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/591_HiBi09_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/591_HiBi09_large.png</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Academia</OrganizationType>
        <OrganizationName>Universita di Salerno</OrganizationName>
        <OrganizationURL></OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>10/16/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType></SoftwareLicenseType>
        <Authors>
           <Author email="ugo.erra@unibas.it">Ugo Erra</Author>
           <Author email="ugo.erra@unibas.it">Bernardino Frola</Author>
           <Author email="ugo.erra@unibas.it">Vittorio Scarano</Author>
           <Author email="ugo.erra@unibas.it">Iain Couzin</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://isis.dia.unisa.it/projects/behavert/">Application</ContentType>
           <ContentType url="http://www.youtube.com/watch?v=eymho1qRqK4&amp;feature=player_embedded">Multimedia</ContentType>
           <ContentType url="http://isis.dia.unisa.it/projects/behavert/">Paper</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Life Sciences</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Ugo Erra,ugo.erra@unibas.it</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>a412a716-04f1-4cf9-a389-8a51d3ea7680</GUID>
        <Name>OpenCurrent</Name>
        <ShortDescription>OpenCurrent is an open source C++ library for solving Partial Differential Equations (PDEs) over regular grids using the CUDA platform from NVIDIA. It breaks down a PDE into 3 basic objects, Grids, Solvers, and Equations. Grid data structures efficiently implement regular 1D, 2D, and 3D arrays in both double and single precision. Grids support operations like computing linear combinations, managing host-device memory transfers, interpolating values at non-grid points, and performing array-wide reductions. Solvers use these data structures to calculate terms arising from discretizations of PDEs, such as finite-difference based advection and diffusion schemes, and a multigrid solver for Poisson equations. These computational building blocks can be assembled into complete Equation objects that solve time-dependent PDEs. One such Equation solver is an incompressible Navier-Stokes solver that uses a second-order Boussinesq model. This equation solver is fully validated, and has been used to study Rayleigh-Benard convection under a variety of different regimes (citation). Benchmarks show it to perform about 8 times faster than an equivalent Fortran code running on an 8-core Xeon. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/590_opencurrent_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/590_opencurrent_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Commercial</OrganizationType>
        <OrganizationName>NVIDIA</OrganizationName>
        <OrganizationURL>http://www.nvidia.com</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>09</ReleaseMonth>
        <ReleaseDay>25</ReleaseDay>
        <ReleaseDateDisplay>09/25/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp></SpeedUp>
        <SoftwareLicenseType>Open Source</SoftwareLicenseType>
        <Authors>
           <Author email="">Jonathan Cohen</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://code.google.com/p/opencurrent/">Code</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>libraries</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Jonathan Cohen</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>21a1b481-5773-403d-8644-730c1c5f1d58</GUID>
        <Name>Correlating Radio Astronomy Signals</Name>
        <ShortDescription>A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will produce over 100 terabytes per day. The future SKA telescope will even require in the order of exaflops, and petabits/s of I/O. A recent trend is to correlate in software instead of dedicated hardware, to increase flexibility and to reduce development efforts. </ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/589_LBA-field_small.jpg</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDABrowser/assets/images/applications/589_LBA-field_large.jpg</BoxArtImageURLMed>
        <BoxArtImageURLHigh></BoxArtImageURLHigh>
        <OrganizationType>Research</OrganizationType>
        <OrganizationName>Astron</OrganizationName>
        <OrganizationURL>http://www.astron.nl</OrganizationURL>
        <ReleaseYear>2009</ReleaseYear>
        <ReleaseMonth>10</ReleaseMonth>
        <ReleaseDay>16</ReleaseDay>
        <ReleaseDateDisplay>10/16/2009</ReleaseDateDisplay>
        <CompatibleGPU></CompatibleGPU>
        <SpeedUp>6.3</SpeedUp>
        <SoftwareLicenseType>Open source</SoftwareLicenseType>
        <Authors>
           <Author email="nieuwpoort@astron.nl">Rob van Nieuwpoort</Author>
        </Authors>
        <ContentTypes>
           <ContentType url="http://www.astron.nl/~nieuwpoort/papers/ics09-correlator.pdf">Paper</ContentType>
           <ContentType url="http://www.astron.nl/~nieuwpoort/">Code</ContentType>
           <ContentType url="http://www.lofar.org/">Application</ContentType>
        </ContentTypes>
        <ApplicationTypes>
           <ApplicationType>Science</ApplicationType>
           <ApplicationType>Signal Processing</ApplicationType>
        </ApplicationTypes>
        <Keywords>
           <Keyword>Rob van Nieuwpoort,nieuwpoort@astron.nl</Keyword>
        </Keywords>
     </Application>


     <Application>
        <GUID>fb82b05f-0449-485d-8779-b53d28646189</GUID>
        <Name>TUNED AND ASYNCHRONOUS STENCIL KERNELS FOR CPU/GPU SYSTEMS</Name>
        <ShortDescription>We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi's iterative method for the 2-D Poisson equation on a structured grid, in both single and double-precision. Properly tuned, our best implementation achieves 98% of the empirical streaming GPU bandwidth (66% of peak) on a NVIDIA C1060.
Motivated to find a still faster implementation, we further consider wildly asynchronous implementations that can reduce or even eliminate the synchronization bottleneck between iterations. In these versions, which are based on the principle of a chaotic relaxation (Chazan and Miranker, 1969), we simply remove or delay
synchronization between iterations, thereby potentially trading of more 
ops (via more iterations to converge) for a higher degree of asynchronous parallelism. Our relaxed-synchronization implementations on a GPU can be 1.2-2.5x faster than our best synchronized GPU implementation while achieving the same accuracy. Looking forward, this result suggests research on similarly fast-and-loose algorithms in the coming era of increasingly massive concurrency and relatively high synchronization or communication costs.</ShortDescription>
        <URL></URL>
        <BoxArtImageURLLow>/content/cudazone/CUDABrowser/assets/images/applications/588_tuned_small.png</BoxArtImageURLLow>
        <BoxArtImageURLMed>/content/cudazone/CUDA