Planning for Performance: Enhancing Achievable Performance for MPI through Persistent Collective Operations

Daniel J. Holmes, Bradley Morgan, Anthony Skjellum, Purushotham V. Bangalore, Srinivas Sridharan

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Advantages of nonblocking collective communication in MPI have been established over the past quarter century, even predating MPI-1. For regular computations with fixed communication patterns, significant additional optimizations can be revealed through the use of persistence (planned transfers) not currently available in the MPI-3 API except for a limited form of point-to-point persistence (aka half-channels) standardized since MPI-1. This paper covers the design, prototype implementation of LibPNBC (based on LibNBC), and MPI-4 standardization status of persistent nonblocking collective operations. We provide early performance results, using a modified version of NBCBench and an example application (based on 3D conjugate gradient) illustrating the potential performance enhancements for such operations. Persistent operations enable MPI implementations to make intelligent choices about algorithm and resource utilization once and amortize this decision cost across many uses in a long-running program. Evidence that this approach is of value is provided. As with non-persistent, nonblocking collective operations, the requirement for strong progress and blocking completion notification are jointly needed to maximize the benefit of such operations (e.g., to support overlap of communication with computation and/or other communication). Further enhancement of the current reference implementation, as well as additional opportunities to enhance performance through the application of these new APIs, comprise future work.
Original languageEnglish
JournalParallel Computing: Systems & Applications
DOIs
Publication statusPublished - 1 Sept 2018

Fingerprint

Dive into the research topics of 'Planning for Performance: Enhancing Achievable Performance for MPI through Persistent Collective Operations'. Together they form a unique fingerprint.

Cite this