CUDA: Week in Review
Friday, June 18, 2010, Issue #26 - Newsletter Home
Welcome to "CUDA: Week in Review," a weekly newsletter for the worldwide CUDA and GPU Computing community.
– Contact us at:
– Follow us on Twitter:
– See previous issues:

Update on GTC 2010 (GPU Technology Conference)
– Sign up today for GTC 2010! (Special code for "CUDA: Week in Review"
   readers - GMCUDANEWS10)
– Read our new blog post here
The Future of Voice Search - Accelerated by CUDA
We recently read about a "speech indexing" start-up called Nexiwave in a posting on We wanted to learn more so we contacted CEO Ben Jiang. Here's an excerpt from our email interview:

NVIDIA: Ben, what makes speech indexing a compelling application?
Ben: Ninety percent of human communication is through speech. The amount of spoken words that could potentially be indexed and searched is staggering. Skype callers have logged over 100 billion minutes of talk time. Conference call companies are carrying over a billion minutes of calls per month. There are hundreds of millions of podcasts on the web, with 24 hours of video uploaded to YouTube every minute.

The problem is that today's information retrieval applications, such as internet search, focus on textual content. Information retrieval from speech content still relies primarily on a human's memory. The objective of speech indexing is to enable us to easily extract information from archived audio and video content. Through the Nexiwave system, an end user can easily search the content and locate the exact location of interest, whether it's a word, a phrase or a general topic.

NVIDIA: What are some of the potentially big markets for speech indexing?
Ben: Think about the conference calls that happen 24x7 at companies around the world. We've all had moments where we thought: "Ahh, John said something really useful in the last call. I wish I could remember exactly what he said." In the future, with speech indexing-enabled conference calls, we will be able to easily do that via a quick search to locate the exact audio snippet. Another interesting market is call centers, where the ability to do a deep search (not just time of call and phone number) will enable companies to find out what customers are really telling them. Other markets are e-discovery (in the legal field), recorded educational media, podcasts and audio-centric enterprises.

NVIDIA: What stage is your technology in?
Ben: Nexiwave 1.0 was released in October 2009. Nexiwave 2.0, our NVIDIA GPU-enabled version, was released on June 3, 2010 and is in production. We offer a SaaS (software as a service) and cloud computing solution as well as software licenses.

NVIDIA: What is the connection between Nexiwave and CMU Sphinx, the speech recognition system from Carnegie Mellon?
Ben: CMU Sphinx is a very popular open source speech processing engine. Our system is built on top of it with many of our own proprietary improvements, such as CUDA-based acoustic scoring (a total re-write of the core acoustic scoring code). We are one of the major commercial companies contributing to it through code fixes, developer resources and user forum support.

NVIDIA: Where does the GPU fit into this?
Ben: Speech indexing is computationally intensive and has traditionally been very expensive. Speech indexing can be efficiently processed in parallel which means the GPU is a perfect fit for it. The GPU will solve the cost issue associated with indexing vast amounts of audio content quickly and accurately.

NVIDIA: How did you like programming/porting in the CUDA C environment?
Ben: Our experience with programming in CUDA C has been enjoyable. The CUDA Best Practices Guide provided tons of help in performance tuning.

NVIDIA: How does CUDA help you?
Ben: Nexiwave has been able to move 75% of our computing processes (or 11 million computation loops per audio minute) to CUDA C, speeding up our application by more than 75 times. This directly translates into cost reduction (we have released a large number of CPU machines back to our computing provider). The exciting thing about this speedup is that it enables us to move into markets where speech indexing has not been possible before.

For more info, see or email
OpenCL v1.1 Drivers and Code Samples Available
OpenCL v1.1 pre-release drivers and SDK code samples are now available to GPU Computing registered developers. Log in or apply for an account to download OpenCL v1.1 today.
CUDA Summer School
NEW! Professor Mike Giles at the University of Oxford has announced a new CUDA course
– When: July 26-30
– Where: University of Oxford
– Note: A 5-day hands-on course for students, postdocs, academics and others who want
   to learn how to develop applications on NVIDIA GPUs using CUDA C.
– See:

Barcelona Computing Week: Programming and Tuning Massively Parallel Systems
– When: July 5-9
– Where: Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, Spain
– Instructors: Wen-Mei W. Hwu, professor of electrical and computer engineering,
   Univ. of Illinois, Urbana-Champaign (UIUC) and David Kirk, NVIDIA fellow
– See:

NEW! Workshop on GPU Programming for Molecular Modeling
– When: Aug. 6-8
– Where: Beckman Institute for Advanced Science & Tech, UIUC
– Instructors: John Stone, Jim Phillips, David Hardy, Kirby Vandivort of UIUC
– Note: Sponsored by the Theoretical and Computational Biophysics Group, NIH Resource
   for Macromolecular Modeling and Bioinformatics ( Participants are
   encouraged to attend "Proven Algorithmic Techniques for Many-Core Processors"
   the preceding week.
– See:

Virtual School of Computational Science and Engineering: Proven Algorithmic Techniques for Many-Core Processors
– When: Aug. 2-6
– Where: Choice of onsite locations, or online
– Instructors: Wen-Mei W. Hwu, professor of electrical and computer engineering, UIUC
   and David Kirk, NVIDIA fellow
– See:
Vicodeo: Accelerated Video Decoding Library for .NET, Python, Java
Vicodeo is a managed library for accelerated video decoding. Features include:
– Faster than real-time video decoding of up to 1080p streams (@30 FPS, Full HD: H.264,
   MPEG-2), even on low-end platforms.
– Targets multiple environments and needs, from high-quality/resolution displays to
   streaming content at low-quality and resolution.
– Applications: Security/surveillance, live streaming, video conferencing/processing.
– Pure software implementation, CUDA based.
Vicodeo is available through Hoopoe, a project that aims to provide cloud services for GPU computing. Hoopoe provides a general environment for running computations, of any kind and model, on GPU hardware. Users may specify the type of hardware they wish to use during the computation or the system will use the available resources to complete the computation as fast as possible. See:
New on CUDA Zone: Design and Performance Evaluation of Image Processing Algorithms on GPUs
Authors: In Kyu Park, Inha Univ., Incheon; Nitin Singhal, Samsung Electronics, Suwon; Man Hee Lee, Inha Univ., Incheon; Sungdae Cho, Samsung Electronics, Suwon; Chris Kim, NVIDIA
Extract: "In this paper, we construe key factors in design and evaluation of image processing algorithms using CUDA. A set of metrics, customized for image processing, are proposed to quantitatively evaluate algorithm characteristics. In addition, we show that a range of image processing algorithms map readily to CUDA using multiview stereo matching, linear feature extraction, JPEG2000 image encoding, and non-photorealistic rendering (NPR) as our example applications…. Analysis is conducted to show the appropriateness of the proposed metrics in predicting the effectiveness of an application for parallel implementation…." Published in IEEE Transactions on Parallel and Distributed Systems, vol. 99, no. 1. See:
CUDA Zone: Have a CUDA-related app or paper? Let us know when you post it on CUDA Zone and we'll send you a CUDA t-shirt!
Cray has an opening for a software engineer to join its Programming Environments team. In this role, you will assist in the packaging and release of third party and open source software related to the Cray Programming Environment, as well as software developed in house.
Requirements: B.S. degree (or equivalent) in Computer Science (or similar degree) with at least 2+ years of directly related experience, as well as background in shell scripting languages and knowledge of C, C++ or Java. Understanding of the basics of Fortran and knowledge of CUDA is a plus. See:
GPU Computing Webinars (CUDA C, OpenCL, Parallel Nsight and more…)
See June webinar schedule: Upcoming webinars include:

– Rapid Application Development Platform for GPGPUs - Jacket / MATLAB
Tuesday, June 22, 2010, 8:00 a.m. pacific
Presented by AccelerEyes, developer of Jacket for MATLABsss

– Intro to MainConcept's CUDA H.264/AVC Encoder
Tuesday, June 29, 2010, 9:00 a.m. pacific
Presented by MainConcept, video and audio codec solutions provider
CUDA Training
– CUDA training from Acceleware
July 26-30, Cambridge, Mass: (with Microsoft)
August 2-6, New York City: (with Microsoft)
Sept. 13-17, Calgary:

– CUDA training from SagivTech
CUDA course: July 12-14, Ra’anana, Israel
GPU/Image Processing course: Aug. 2-4, Ra’anana, Israel

– CUDA training from EMPhotonics
On-site standard and customized training programs
CUDA Research and Certification
NVIDIA has launched new programs for GPU Computing developers. For more info, see:
CUDA and Academia
Over 350 universities are teaching CUDA and GPU Computing courses.
CUDA Center of Excellence Program
The CUDA Center of Excellence (CCOE) Program recognizes universities that are expanding the frontier of massively parallel computing using CUDA. See:
Parallel Execution of Sequential Programs on Multi-Core Architectures
June 20, Saint-Malo, France

GPGPU Briefing on Financial Services (Microsoft/NVIDIA)
June 21, New York, NY

SIFMA Financial Services Tech Expo
June 22-24, New York, NY

High Performance Graphics 2010
June 25-27, Saarbrucken, Germany

GPUs in Chemistry and Materials Science
June 28-30, Univ. of Pittsburgh

Parallel Symbolic Computation 2010 (PASCO)
July 21-23, Grenoble, France

July 25-29, Los Angeles

Virtual School of Comp. Science & Engineering Summer School
Aug. 2-6, choice of onsite locations (Proven Algorithmic Techniques for Many-Core Processors)

Symposium on Chemical Computations on GPGPUs
Aug. 22-26, Boston

Unconventional High Performance Computing 2010 (UCHPC 2010)
Aug. 31-Sept. 1, Italy

GPU Technology Conference (GTC) 2010
Sept. 20-23, San Jose, Calif. (now accepting proposals from industry and academia)

Supercomputing 2010
Nov. 13-19, New Orleans, LA

IEEE International Parallel & Distributed Processing Symposium
May 16-20, 2011, Anchorage, AL

(To list an event, email:

CUDA Articles in Dr. Dobb's
– Supercomputing for the Masses, Part 18:
– Supercomputing for the Masses, Part 17:
– Supercomputing for the Masses, Part 16:
– Supercomputing for the Masses, Part 15:
CUDA Books
– Programming Massively Parallel Processors by D. Kirk, W. Hwu:
– See additional books here:
CUDA Toolkit
Download CUDA Toolkit 3.0:
CUDA Documentation
Download developer guides and documentation:
NVIDIA Parallel Nsight
Download the Parallel Nsight Beta:
Download the Parallel Nsight Beta Release Notes:
– Check out the NVIDIA Research site:
– Read previous issues of CUDA: Week in Review:
– Follow CUDA & GPU Computing on Twitter:
– Network with other developers:
– Stayed tuned to GPGPU news and events:
– Learn more about CUDA on CUDA Zone:
– Watch CUDA on YouTube:
About CUDA
CUDA is NVIDIA's parallel computing hardware architecture. NVIDIA provides a complete toolkit for programming on the CUDA architecture, supporting standard computing languages such as C, C++, and Fortran as well as APIs such as OpenCL and DirectCompute.

Send comments and suggestions to:
You are receiving this email because you have previously expressed interest in NVIDIA products and technologies. Click here to opt in specifically to CUDA: Week in Review.

Feel free to forward this email to customers, partners and colleagues.

Copyright © 2010 NVIDIA Corporation. All rights reserved. 2701 San Tomas Expressway, Santa Clara, CA 95050.