CUDA Spotlight: GPU Computing Momentum at Microway
This week's Spotlight is on Stephen Fried, founder of Microway and veteran technology inventor (with a current focus on clusters and InfiniBand fabrics). Steve is a former space scientist and FAA flight examiner who can be found on weekends in his sailplane soaring over the Green Mountains of Vermont.
Steve Fried is seen in this picture in his Schleicher ASH-26 E sailplane, which has a 60 foot wing span and includes a 55 HP rotary engine in the hull.
We caught up with Steve after learning that BioStack-LS – a CUDA/Tesla-based Microway product – was named "Best of Show" finalist at the Bio-IT World Conference in Boston.
NVIDIA: Steve, tell us about Microway.
NVIDIA: Where are you seeing the most momentum in GPU computing?
The crucial problem is not memory bandwidth but memory latency. The time that it takes to retrieve a piece of data is determined by the latency of the memory and it is identical for CPUs and GPUs. The trick that GPGPUs employ to get around latency is their ability to queue up a huge number of parallel requests for data. These requests are queued and retrieved by the GPU's memory controller. When the data arrives back at the GPU it is sent to the cores that requested it, which in turn wake up the thread that requested a particular piece of data. Data for that core quickly forms a queue for the parallel threads that have made requests.
With hundreds to thousands of requests now in its queue, each core is able to run at full speed for many thousands of cycles. This makes it possible to achieve FPU efficiencies that approach or exceed 90%, giving the user the ability to take full advantage of the 1 Teraflop performance of Tesla.
We believe that GPUs are ideal for executing the parallel vector applications that dominate much of the bio-informatics world. This week we are at Bio-IT World demonstrating BioStack-LS. BioStack-LS includes seven GPU compute nodes, each with two Tesla C2070s. BioStack-LS represents an innovation for the bio-medical community because it's delivered pre-configured for life sciences software, including AMBER, MATLAB, NAMD and VMD.
NVIDIA: Why are people embracing the CUDA parallel programming model?
The user writing code for such an environment had to worry about 1) reading the kernel code from a file and issuing it to the vector card that held one or more VPUs; 2) sending a signal to the VPUs to start execution; 3) sending and receiving data between the host and the VPU; 4) and coordinating tasks carried out by VPUs (using semaphores whose purpose was to guarantee that data would never be read by VPUs sharing data until all VPUs completed a particular portion of an algorithm).
NVIDIA:How did you become interested in this area?
NVIDIA:What are some of the real-world applications of your products?
NVIDIA:As computing becomes faster, what will we be able to do in the future?
Stephen S. Fried's Bio
In 1982, he co-founded Microway with his wife Ann and wrote the first code to employ an Intel math coprocessor in an IBM-PC. During the 1980s he was a frequent contributor to both BYTE and Dr. Dobbs and won several Byte BOM awards. At Microway he designed the first PC accelerator board to allow the use of math coprocessors. This was followed by parallel processing cards that used Inmos Transputers and later, the Intel i860 vector processor.
Fried's recent software products have focused on tools for monitoring, controlling and debugging clusters and InfiniBand fabrics. His most recent peer reviewed publication, "Loop Heat Pipes for Cooling Systems of Servers,"appears in IEEE Transactions on Components and Packaging Technologies. It is co-authored with Professor Yury Maydanik, chairperson of the International Heat Pipe Symposium. He can be reached at email@example.com.