Developing the multi-level parallelisms for Fluidity-ICOM -- Paving the way to exascale for the next generation geophysical fluid modelling technology

Xiaohu Guo, Gerard Gorman, Michael Lange, Lawrence Mitchell, Michele Weiland

Research output: Contribution to conferencePaperpeer-review

Abstract

The major challenges caused by the increasing scale and complexity of the
current petascale and the future exascale systems are cross-cutting concerns of
the whole software ecosystem. The trend for compute nodes is towards greater
numbers of lower power cores, with a decreasing memory to core ratio. This is
imposing a strong evolutionary pressure on numerical algorithms and software to
efficiently utilise the available memory and network bandwidth.

Unstructured finite elements codes have been effectively parallelised using
domain decomposition methods, implemented using libraries such as the Message
Passing Interface (MPI) for a long time. However, there are many algorithmic and
implementation optimisation opportunities when threading is used for intra-node
parallelisation for the latest multi-core/many-core platforms. The benefits include
reduced memory requirements, cache sharing, reduced number of partitions and
less MPI communication. While OpenMP is promoted as being easy to use and
allows incremental parallelisation of codes, naive implementations frequently
yield poor performance. In practice, as with MPI, the same care and attention
should be exercised over algorithm and hardware details when programming with
OpenMP.

In this paper, we report progress in implementing a hybrid OpenMP-MPI version of the
unstructured finite element application Fluidity. In the matrix
assembly kernels, the OpenMP parallel algorithm uses graph colouring to identify
independent sets of elements that can be assembled simultaneously with no race
conditions. The sparse linear systems defined by various equations are solved using
threaded PETSc and HYPRE which is utilised as a threaded preconditioner through
the PETSc interface. Since unstructured finite element codes are well known to be memory bound,
particular attention is paid to ccNUMA architectures where data locality is
particularly important to achieve good intra-node scaling characteristics. We also
demonstrate that utilising non-blocking algorithm and libraries are critical to
mixed-mode application so that it can achieve better parallel performance than
the pure MPI version.
Original languageEnglish
Publication statusAccepted/In press - 2015
EventExascale Applications and Software Conference - Edinburgh, United Kingdom
Duration: 9 Apr 201311 Apr 2013

Conference

ConferenceExascale Applications and Software Conference
CountryUnited Kingdom
CityEdinburgh
Period9/04/1311/04/13

Fingerprint

Dive into the research topics of 'Developing the multi-level parallelisms for Fluidity-ICOM -- Paving the way to exascale for the next generation geophysical fluid modelling technology'. Together they form a unique fingerprint.

Cite this