Projects per year
Abstract
The trend of all modern computer architectures, and the path to exascale, is towards increasing numbers of lower power cores, with a decreasing memory to core ratio. This imposes a strong evolutionary pressure on algorithms and software to efficiently utilise all levels of parallelism available on a given platform while minimising data movement. Unstructured finite elements codes have long been effectively parallelised using domain decomposition methods, implemented using libraries such as the Message Passing Interface (MPI). However, there are many optimisation opportunities when threading is used for intra-node parallelisation for the latest multi-core/many-core platforms. The benefits include increased algorithmic freedom, reduced memory requirements, cache sharing, reduced number of parti- tions, less MPI communication and I/O overhead.
In this paper, we report progress in implementing a hybrid OpenMP–MPI version of the unstructured finite element code Fluidity. For matrix assembly kernels, the OpenMP parallel algorithm uses graph colouring to identify independent sets of elements that can be assembled concurrently with no race conditions. In this phase there are no MPI overheads as each MPI process only assembles its own local part of the global matrix. We use an OpenMP threaded fork of PETSc to solve the resulting sparse linear systems of equations. We experiment with a range of preconditioners, including HYPRE which provides the algebraic multigrid preconditioner BoomerAMG where the smoother is also threaded. Since unstruc- tured finite element codes are well known to be memory latency bound, particular attention is paid to ccNUMA architectures where data locality is particularly important to achieve good intra-node scaling characteristics. We also demonstrate that utilising non-blocking algorithms and libraries are critical to mixed-mode application so that it can achieve better parallel performance than the pure MPI version.
In this paper, we report progress in implementing a hybrid OpenMP–MPI version of the unstructured finite element code Fluidity. For matrix assembly kernels, the OpenMP parallel algorithm uses graph colouring to identify independent sets of elements that can be assembled concurrently with no race conditions. In this phase there are no MPI overheads as each MPI process only assembles its own local part of the global matrix. We use an OpenMP threaded fork of PETSc to solve the resulting sparse linear systems of equations. We experiment with a range of preconditioners, including HYPRE which provides the algebraic multigrid preconditioner BoomerAMG where the smoother is also threaded. Since unstruc- tured finite element codes are well known to be memory latency bound, particular attention is paid to ccNUMA architectures where data locality is particularly important to achieve good intra-node scaling characteristics. We also demonstrate that utilising non-blocking algorithms and libraries are critical to mixed-mode application so that it can achieve better parallel performance than the pure MPI version.
Original language | English |
---|---|
Pages (from-to) | 227–234 |
Number of pages | 8 |
Journal | Computers & Fluids |
Volume | 110 |
Early online date | 16 Sept 2014 |
DOIs | |
Publication status | Published - 30 Mar 2015 |
Keywords / Materials (for Non-textual outputs)
- Fluidity-ICOM
- OpenMP
- MPI
- FEM
- Matrix assembly
- Sparse linear solver
- HYPRE
- PETSc
- SpMV
Fingerprint
Dive into the research topics of 'Developing a scalable hybrid MPI/OpenMP unstructured finite element model'. Together they form a unique fingerprint.Projects
- 2 Finished
-
Developing the multi-level parallelisms for Fluidity-ICOM -- Paving the way to exascale for the next generation geophysical fluid modelling technology
Guo, X., Gorman, G., Lange, M., Mitchell, L. & Weiland, M., 2015, (Accepted/In press).Research output: Contribution to conference › Paper › peer-review
-
Exploring the thread-level parallelism for the next generation geophysical fluid modelling framework Fluidity-ICOM
Guo, X., Gorman, G., Lange, M., Mitchell, L. & Weiland, M., 2013, In: Procedia Engineering. 61, p. 251-257Research output: Contribution to journal › Article › peer-review
Open AccessFile -
Developing hybrid MPI/OpenMP for PETSc
Gorman, G., Kramer, S., Weiland, M. & Mitchell, L., Apr 2012, Open Petascale Libraries.Research output: Book/Report › Other report