CUDA Spotlight: Paul Richmond

Paul Richmond

GPU-Accelerated Agent-Based Simulation of Complex Systems

This week's Spotlight is on Paul Richmond, a Vice Chancellor's Research Fellow at the University of Sheffield (a CUDA Research Center). Paul's research interests relate to the simulation of complex systems and to parallel computer hardware.

Within his current position he is developing novel software enabling modelers to accelerate large simulations of interacting autonomous individuals (agents) for the purposes of interactive visualization, prediction and intelligent decision support.

This interview is part of the CUDA Spotlight Series.

Q & A with Paul Richmond

NVIDIA: Paul, tell us about the FLAME GPU software which you developed.
Paul: Agent-Based Simulation is a powerful technique used to assess and predict group behavior from a number of simple interacting rules between communicating autonomous individuals (agents). Individuals typically represent some biological entity such as a molecule, cell or organism and can therefore be used to simulate systems at varying biological scales.

The Flexible Large-scale Agent Modelling Environment for the GPU (FLAME GPU) is a piece of software which enables high level descriptions communicating agents to be automatically translated to GPU hardware. With FLAME GPU, simulation performance is enormously increased over traditional agent-based modeling platforms and interactive visualization can easily be achieved. The GPU architecture and the underlying software algorithms are abstracted from users of the FLAME GPU software, ensuring accessibility to users in a wide range of domains and application areas.

NVIDIA: How does FLAME GPU leverage GPU computing?
Paul: Unlike other agent-based simulation frameworks, FLAME GPU is designed from the ground up with parallelism in mind. As such it is possible to ensure that agents and behavior are mapped to the GPU efficiently in a way which minimizes data transfer during simulation.

One of the most exciting aspects of GPU-accelerated simulation is that simulations can often be run faster than real-time. For example, a pedestrian evacuation model can be matched to real-world conditions and an evacuation plan can be simulated, essentially looking into the future for potential danger or problems. This forms the basis of my current work into using such agent simulation techniques for prediction and decision making.

Example of pedestrian evacuation modeling simulation with FLAME GPU

NVIDIA: What challenges did you face?
Paul: GPUs are very good at simulating homogeneous groups of agents where behavior is consistent across a population. In homogenous cases, behavior can be executed as kernels with very little divergence, which results in high performance. However, as the complexity of agents within a population increases, so too does the heterogeneity, ultimately impacting performance.

The main challenge in parallelizing a generic agent-based model is ensuring that agents can be maintained in homogeneous groups. The use of a state-based representation is the main technique which is used to ensure homogenous grouping of agent behaviors.

The creation of spare lists of agents is another area which is challenging to performance. Spare lists can be introduced by agents transitioning between states or through the creation and deletion of agents during simulation. Parallel stream compaction, a process of using prefix scan to remove empty gaps in agent lists, is used heavily in many places within the FLAME GPU software to prevent sparse data.

NVIDIA: Which CUDA features and GPU libraries do you use?
Paul: Initially FLAME GPU used my own implementations of sorting and stream compaction techniques. However, this functionality has been migrated, initially to CUDPP and more recently to Thrust. It is likely that CUB will be used in the future to gain even more performance.

I am currently re-engineering a large amount of the FLAME GPU code to take advantage of newer GPU features such as concurrency through parallel streams, GPUDirect for multi GPU architectures and faster sorting methods based on improved atomics performance.

NVIDIA: What is ACRC and how is it using the outputs of FLAME GPU?
Paul: The Advanced Computing Research Centre (ACRC) at the University of Sheffield is a gateway to collaborating with industry, with simulation as one of the key focus areas. FLAME GPU and similarly the FLAME HPC software (for distributed architectures) are used to provide simulations with an industrial focus impacting areas such as planning, construction, the environment, medical research and economic policy. It is very exciting to see the impact of high-performance simulation techniques being applied to real industrial needs. So far there has been lots of varied interest.

NVIDIA: Tell us about Osteolytica.
Paul: Osteolytica is the result of a training program to promote interdisciplinary research and is a great example of how CUDA is being applied to a wide range of domains within the University of Sheffield CUDA research Center.

During the training program I met with Dr. Andrew Chantry, a clinical researcher in the school of medicine. We discussed a particular problem related to not having a robust method for quantifying the extent of cancer-induced osteolytic lesion damage of bone samples obtained through microCT volumetric scans.

I realized that GPU computing could be used to provide a novel algorithm for localized bone surface reconstruction, effectively providing a metric to measure damage. This software is now used heavily within Dr. Chantry's research to evaluate cancer treatments and the university is exploring commercialization options.

Osteolytica lesion analysis software

NVIDIA: When did you first see the promise of GPU computing?
Paul: My background is in computer graphics, so I was an early convert to GPU computing. Before CUDA arrived I was accelerating general purpose code using GPGPU techniques relying on graphics primitives. GPU programming has come a long way since then and GPUs are now infinitely easier to work with.

NVIDIA: What are you looking forward to in the next five to ten years?
Paul: In the immediate future I look forward to the increased performance, memory sizes and scalability (through NVLink) of future architectures, as described at the recent GPU Tech Conference. This will enable agent-based models of increasing scale and complexity to be simulated with the potential to offer new insights into biological processes.

More generally, I suspect that over the next ten years limitations in scaling simulations will be not be in hardware, instead software will have to evolve to become increasingly parallel. Platforms such as CUDA are extremely helpful for programming GPU devices. However, in some cases additional algorithms and data parallel techniques will be required.

I am interested in the idea that biological brain function can offer some insight into how future hardware and software techniques can scale to the extreme levels evident in nature.

Bio for Paul Richmond

Dr. Paul Richmond is a Vice Chancellor Research Fellow in the Department of Computer Science at the University of Sheffield. He is currently working on the acceleration of complex systems simulations using accelerator architectures such as GPUs. His research interests relate to the software engineering challenges of how complex systems can be described using high level or domain specific tools and how automated mapping to parallel and distributed hardware architectures can be achieved.

Paul is particularly interested in applying agent-based techniques to cellular biology, computational neuroscience, pedestrian and transport systems as well as working with industry through the University of Sheffield's new Advanced Digital Research Center (ACRC), of which he is a member.

Relevant Links

Contact Info
Email: p.richmond (at) sheffield [dot] ac [dot] uk