GPU-Accelerated
SPECFEM3D Cartesian

Get started today with this GPU Ready Apps Guide.

SPECFEM3D Cartesian

SPECFEM3D Cartesian simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra (structured or not.) It can, for instance, model seismic waves propagating in sedimentary basins or any other regional geological model following earthquakes. It can also be used for non-destructive testing or for ocean acoustics.

SPECFEM3D Cartesian delivers up to 38x or more speedup on an NVIDIA® Tesla® P100 node compared to a CPU-only system, enabling users to run simulations in hours instead of weeks.

SPECFEM3D Cartesian Delivers 38X Speedup with GPUs

Installation

System Requirements

SPECFEM3D_Cartesian should be portable to any parallel platform with a modern C and Fortran compiler and a recent MPI implementation.

Optionally git can be used to obtain the source code.

Download and Installation Instructions

   1. Download the SPECFEM3D_Cartesian source code. The most recent version can be obtained using

git clone --recursive --branch devel

You can also find instructions to get earlier/release versions and user manual and various user resources from: https://geodynamics.org/cig/software/specfem3d/

2. Make sure that nvcc, gcc and gfortran are available on your system and in your path. If they are not available please contact the system administrator. In addition to build with MPI mpicc, mpif90 must be in your path.

Note also that the performance of the CPU only runs and also the database generation step is sensitive to the compiler chosen. PGI compiler is much faster than gfortran for example for the database generation step and a little faster for CPU only case for the xspecfem3d simulation. It does not have significant effect on the perf of the xspecfem3d simulation for the GPU cases.

3. Configure the package

./configure --with-cuda=cuda8

Note this will build an executable for devices of compute capability 6.0 or higher.

  5. Build the Program

make

Running Jobs

Running jobs requires modifying the parameter files that define the simulation. There is a specific example described in the benchmarking section based on the “simple model” example found in “specfem3d/EXAMPLES/meshfem3D_examples/simple_model”.

The parameter files are found in the “DATA” folder. Make sure that DATA/Par_file and DATA/meshfem3D_files/Mesh_Par_file are set correctly. Note particularly that the number of MPI tasks (NPROC) must be set in DATA/Par_file and the number of tasks in each direction (NPROC_XI and NPROC_ETA) must be set in DATA/meshfem3D_files/Mesh_Par_file. It is also important that

GPU_MODE = .true.

is set in Par_data or the program will not run on GPUs.

Running jobs is then a three stage process. Here we take the example of running on 4 tasks. Note that all 3 stages should be run with the same number of MPI tasks. For GPU runs, you will use 1 MPI task per GPU. For CPU only runs you will typically use 1 MPI task per core. The Par_file and Mesh_Par_file need to be modified for each different case.

1. Run the in-house mesher

mpirun -np 4 ./bin/xmeshfem3D

2. Generate the databases

mpirun -np 4 ./bin/xgenerate_databases

3. Run the simulation

mpirun -np 4 ./bin/xspecfem3D

Benchmarks

This section walks through the process of testing the performance on your system and shows performance on a typical configuration. The example described here is based on the “simple model” example found in “specfem3d/EXAMPLES/meshfem3D_examples/simple_model”.

1. Download and install the program as described in the Installation section

2. Download the SPECFEM3D_Cartesian_GPU_READY_FILES_v2.tgz and extract the file. This file includes example scripts that build specfem3d and run this example by running “build_run_specfem3d_cartesian.sh”. It would need small modifications to paths/env modules etc used to run on different system. The remaining steps below describe step by step running the 4 GPU test case specifically. The included scripts will run cases for 1,2,4 GPUs and 32 or 36 CPU cores and can be reviewed for additional details.

tar -xzvf SPECFEM3D_Cartesian_GPU_READY_FILES_v2.tgz

3. Copy the input_cartesian_v4.tar to your root specfem3d installation folder

cp input_cartesian_v4.tar specfem3d

4. Change to the root SPECFEM directory

cd specfem3d

5. Extract the various Par_files, from the tar file.

tar -xvf input_cartesian_v4.tar

This will extract variations of the data files into the DATA and DATA/Meshfem3D_files directory in your SPECFEM3D Cartesian installation, and update the STATIONS, FORCE SOLUTION and CMTSOLUTION files. We include files for 1 GPU, 2 GPUs and 4 GPUs and for 36 CPUs. The problem size has been increased slightly to increase the run time. Note also that there are restrictions in the number of MPI tasks that can be used. More information can be found in the DATA/Par_file. The NPROC variables will change depending on whether the run uses 1 GPU (1x1), 2 GPUs (2x1), 4 GPUs (2x2) or 36 CPU cores (9x4). Also note that the example described below corresponds to mesh size 288x256 for the 36 core CPU. Example files included in the input_cartesian.tar file also include Mesh_Par_file examples for 256x256 which can be used for a 32 core CPU case. Other variations could be created for different core counts. The rules for valid sizes as described in the Mesh_Par_file.

Par_file:

NPROC              = 4     # changes depending on num GPUs or cpu cores used
NSTEP              = 10000
DT                 = 0.01
GPU_MODE           = .true.     # change to .false. For cpu only run

DATA/meshfem3D_files/Mesh_Par_file:
NEX_XI             = 288
NEX_ETA            = 256
# number of MPI processors along xi and eta (can be different)
NPROC_XI           = 2     # 2x2 for 4 GPU example
NPROC_ETA          = 2
NREGIONS           = 4
# define the different regions of the model as :
#NEX_XI_BEGIN #NEX_XI_END #NEX_ETA_BEGIN #NEX_ETA_END #NZ_BEGIN #NZ_END #material_id
1      288      1     256      1      4      1
1      288      1     256      5      5      2
1      288      1     256      6      15     3
14     25       7     19       7      10     4

6. Change to the DATA directory

cd DATA

7. Copy the file corresponding to the number of GPUs you want to run on to Par_data e.g. for the four GPU version

cp Par_file_4_proc Par_file

8. Change to the meshfem3D_files directory

cd meshfem3D_files

9. Copy the file corresponding to the number of GPUs you want to run onto the Mesh_par_file e.g. for the four GPU version

cp Mesh_Par_file_4_proc_288x256 Mesh_Par_file

10. Change back to the base directory

cd ../..

11. Remove old OUTPUT_FILES and DATABASES_MPI files. Note you may want to save to a new location, but subsequent runs will overwrite files in the OUTPUT_FILES folder.

rm OUTPUT_FILES -r
mkdir OUTPUT_FILES
rm DATABASES_MPI -r
mkdir DATABASES_MPI

12. Run the in-house mesher

mpirun -np 4 ./bin/xmeshfem3D

13. Generate the databases

mpirun -np 4 ./bin/xgenerate_databases

14. Run the simulation

mpirun -np 4 ./bin/xspecfem3D

15. Review the output_solver.txt file found in OUTPUT_FILES folder. The performance metric is the “Total elapsed time” found in this file.

grep "Total elapsed time" OUTPUT_FILES/output_solver.txt

Note that the mesher and generate databases steps are not GPU accelerated and in practice is done once and then the output may be used for many (solver step) simulations. This is why the timing of only the solver step is the used for the benchmark.

Expected Performance Results

This section provides expected performance benchmarks for different across single and multi-node systems.

SPECFEM3D Cartesian Multi-GPU Scaling Performance
SPECFEM3D Cartesian Delivers 38X Speedup with GPUs
SPECFEM3D Cartesian 10X-40X Faster

Recommended System Configurations

Hardware Configuration

PC

Parameter
Specs

CPU Architecture

x86

System Memory

16-32GB

CPUs

2 (10+ cores, 2 GHz)

GPU Model

GeForce® GTX TITAN X

GPUs

1-4

Servers

Parameter
Specs

CPU Architecture

x86

System Memory

64GB

CPUs/Node

2 (10+ cores, , 2+ GHz)

Total # of Nodes

1

GPU Model

NVIDIA®Tesla® P100,
Tesla® K80

GPUs/Node

1-4

Software Configuration

Software stack

Parameter
Version

OS

CentOS 7.3

GPU Driver

375.20 or newer

CUDA Toolkit

8.0 or newer

PGI Compliers

16.9

Build Your Ideal GPU Solution Today.