Get started today with this GPU Ready Apps Guide.


SPECFEM3D Globe simulates global and continental scale regional seismic wave propagation. This application enables researchers to analyze the effects of lateral variations in compressional-wave speed, shear-wave speed, density, rotation, and self-gravitation on a 3D crustal model.

The latest version 7.0 offers GPU graphics card support for both OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library.

SPECFEM3D GLOBE runs over 25X faster on a single NVIDIA® Tesla® P100 GPU, helping scientists run their seismic simulations in hours instead of weeks.

SPECFEM3D Globe Delivers 25X Speedup with GPUs


System Requirements

SPECFEM3D_GLOBE should work on any platform with MPI support. For the purpose of this guide, we suggest the following configuration:

  • Operating system: Centos 7 or newer
  • CUDA 8.0 for NVIDIA Pascal™ GPUs
  • A modern C and Fortran compiler
  • NVIDIA GPU supporting compute capability 3.0 or higher

Download and Installation Instructions

   1. Download the SPECFEM3D_Globe 7.0.0 source code

   2. Extract the Package

tar xf SPECFEM3D_GLOBE_V7.0.0.tar.gz

3. Required Compliers

Make sure that nvcc, gcc and gfortran are available on your system and in your path. If they are not available please contact the system administrator.

4. Configure the Package

./configure --with-cuda=cuda5

Note here we are configuring to use the more modern CUDA bindings in CUDA 5.0 and later.

  5. Build the Program


Note in testing TESLA P100 GPUs with specfem3d_globe-7.0.0 it has been observed that the default configure and make steps described in the guide will lead to building a binary that uses CUDA nvcc flags “-gencode=arch=compute_35,code=\"sm_35,compute_35\" “. For running on P100 GPUs a user might normally edit the makefile generated by configure (or prior to running configure) to the sm_60 is selected instead of sm_35. When this is done with specfem3d_globe-7.0.0 the resulting sm_60 binary actually slightly slower when running on P100 vs the default sm_35 binary. This is unexpected and under investigation.

Running Jobs

Running SPECFEM3D Globe is a two stage process. First, the mesh used to solve the final problem must be created. Then the solver can be run to generate the final solution.

Note that as some arrays are statically sized if any changes are made to the DATA/Par_file (which sets some parameters) then the program must be recompiled.

1. Update the DATA/Par_file and remaining data files. 
It is crucial that to use GPUs, the GPU_MODE parameter in DATA/Par_file must be set to .true

2. Rebuild the program. Note, because the program uses static array sizes then any changes in the Par_file require the program to be rebuilt.

make clean

3. Run the mesher (in this case we are running with 4 MPI tasks and on 4 GPUs)

mpirun -np 4 bin/xmeshfem3D

4. Run the main program

mpirun -np 4 bin/xspecfem3D


This section walks through the process of testing the performance on your system and shows performance on a typical configuration.

1. Download and install the program as described in the Installation section

2. Download the tar file with the required input files

3. Change to the root SPECFEM directory

4. Extract the various Par_files, from the tar file.

tar -xf parfiles.tar

This will extract the data files into the DATA directory in your SPECFEM installation, and update the STATIONS and CMTSOLUTION files. We include files for 1 GPU, 2 GPUs and 4 GPUs and for 36 CPUs. The example we are using is a modified version of the “global_s362ani_shakemovie” example that is distributed with SPECFEM3D Globe. The problem size has been reduced so that it fits more easily on one node. Note also that there are restrictions in the number of MPI tasks that can be used. The NPROC variables will change depending on whether the run uses 1 GPU (1x1), 2 GPUs (2x1), 4 GPUs (2x2) or 36 CPU cores (6x6).

Key changes that are made to the original Par_file under the are as follows :

NPROC_XI=2      # note 2x2 corresponds to the 4 GPU case
GPU_MODE=.true.      # note .false. For CPU only case

These changes are to reduce the size of the simulation, enable/disable GPU acceleration and also to adjust for the number of MPI procs (1 per GPU) needed for each of the test cases. More information can be found in the Par_file.

5. Change to the DATA directory


6. Copy the file corresponding to the number of GPUs you want to run on to Par_file e.g. for the four GPU version

cp Par_file_96x96_100_4GPU Par_file

7. Change back to the base directory

cd ..

8. Remake the program and run the mesher. Because the program uses static array sizes, every time the Par_file is changed then the program must be recompiled.

make clean
mpirun -np 4 bin/xmeshfem3D

9. Run the solver

mpirun -np 4 bin/xspecfem3D

10. Look at the end of the solver output to check how long the program took to run

tail OUTPUT_FILES/output_solver.txt

Note that the mesher step is not GPU accelerated and in practice is done once and then the output may be used for many (solver step) simulations. This is why the timing of only the solver step is the used for the benchmark.

Expected Performance Results

This section provides expected performance benchmarks for different across single and multi-node systems.

SPECFEM3D Globe Multi-GPU Scaling Performance
SPECFEM3D Global Multi-GPU Scaling Performance
SPECFEM3D Globe Delivers 25x Speedup with GPUs

Recommended System Configurations

Hardware Configuration



CPU Architecture


System Memory



2 (10+ cores, 2 GHz)

GPU Model






CPU Architecture


System Memory



2 (10+ cores, , 2+ GHz)

Total # of Nodes


GPU Model

NVIDIA ®Tesla ® P100,
Tesla ® K80



Software Configuration

Software stack



CentOS 7.3

GPU Driver

375.20 or newer

CUDA Toolkit

8.0 or newer

PGI Compliers


Build Your Ideal GPU Solution Today.