Edinburgh Research Explorer

Leveraging MPI RMA to optimise halo-swapping communications in MONC on Cray machines

Research output: Contribution to journalArticle

Related Edinburgh Organisations

Open Access permissions

Open

Documents

  • Download as Adobe PDF

    Rights statement: This is the peer reviewed version of the following article: Brown N, Bareford M, Weiland M. Leveraging MPI RMA to optimize halo‐swapping communications in MONC on Cray machines. Concurrency Computat Pract Exper. 2018;e5008, which has been published in final form at https://doi.org/10.1002/cpe.5008. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.

    Accepted author manuscript, 361 KB, PDF document

https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.5008
Original languageEnglish
Article numbere5008
Number of pages14
JournalConcurrency and Computation: Practice and Experience
Volume31
Issue number16
Early online date25 Sep 2018
DOIs
Publication statusE-pub ahead of print - 25 Sep 2018

Abstract

Remote Memory Access (RMA), also known as single‐sided communications, provides a way for reading and writing directly into the memory of other processes without having to issue explicit message passing style communication calls. Previous studies have concluded that MPI RMA can provide increased communication performance over traditional MPI Point to Point (P2P), but these are based on synthetic benchmarks rather than real‐world codes. In this work, we replace the existing non‐blocking P2P communication calls in the Met Office NERC Cloud model, a mature code for modeling the atmosphere, with MPI RMA. We describe our approach in detail and discuss the options taken for correctness and performance. Experiments are performed on ARCHER, a Cray XC30, and Cirrus, an SGI ICE machine. We demonstrate on ARCHER that, by using RMA, we can obtain between a 5% and 10% reduction in communication time at each timestep on up to 32768 cores, which over the entirety of a run (with many timesteps) results in a significant improvement in performance compared to P2P on the Cray. However, RMA is not a silver bullet, and there are challenges when integrating RMA calls into existing codes: important optimizations are necessary to achieve good performance and library support is not universally mature, as is the case on Cirrus. In this paper, we discuss, in the context of a real‐world code, the lessons learned converting P2P to RMA, explore performance and scaling challenges, and contrast alternative RMA synchronization approaches in detail.

    Research areas

  • MPI RMA, One sided communications, MONC, Cray XC30, SGI ICE, ARCHER, Cirrus

Download statistics

No data available

ID: 61617699