Developing a scalable hybrid MPI/OpenMP unstructured finite element model

Xiaohu Guo, Michael Lange, Gerard Gorman, Lawrence Mitchell, Michele Weiland

Research output: Contribution to journalArticlepeer-review


The trend of all modern computer architectures, and the path to exascale, is towards increasing numbers of lower power cores, with a decreasing memory to core ratio. This imposes a strong evolutionary pressure on algorithms and software to efficiently utilise all levels of parallelism available on a given platform while minimising data movement. Unstructured finite elements codes have long been effectively parallelised using domain decomposition methods, implemented using libraries such as the Message Passing Interface (MPI). However, there are many optimisation opportunities when threading is used for intra-node parallelisation for the latest multi-core/many-core platforms. The benefits include increased algorithmic freedom, reduced memory requirements, cache sharing, reduced number of parti- tions, less MPI communication and I/O overhead.

In this paper, we report progress in implementing a hybrid OpenMP–MPI version of the unstructured finite element code Fluidity. For matrix assembly kernels, the OpenMP parallel algorithm uses graph colouring to identify independent sets of elements that can be assembled concurrently with no race conditions. In this phase there are no MPI overheads as each MPI process only assembles its own local part of the global matrix. We use an OpenMP threaded fork of PETSc to solve the resulting sparse linear systems of equations. We experiment with a range of preconditioners, including HYPRE which provides the algebraic multigrid preconditioner BoomerAMG where the smoother is also threaded. Since unstruc- tured finite element codes are well known to be memory latency bound, particular attention is paid to ccNUMA architectures where data locality is particularly important to achieve good intra-node scaling characteristics. We also demonstrate that utilising non-blocking algorithms and libraries are critical to mixed-mode application so that it can achieve better parallel performance than the pure MPI version.
Original languageEnglish
Pages (from-to)227–234
Number of pages8
JournalComputers and Fluids
Early online date16 Sep 2014
Publication statusPublished - 30 Mar 2015


  • Fluidity-ICOM
  • OpenMP
  • MPI
  • FEM
  • Matrix assembly
  • Sparse linear solver
  • PETSc
  • SpMV


Dive into the research topics of 'Developing a scalable hybrid MPI/OpenMP unstructured finite element model'. Together they form a unique fingerprint.

Cite this