A data streaming model in MPI

Ivy Bo Peng, Stefano Markidis, Erwin Laure, Daniel Holmes, Mark Bull

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data streaming model is an effective way to tackle the challenge of data-intensive applications. As traditional HPC applications generate large volume of data and more data-intensive applications move to HPC infrastructures, it is necessary to investigate the feasibility of combining message-passing and streaming programming models. MPI, the de facto standard for programming on HPC, cannot intuitively express the communication pattern and the functional operations required in streaming models. In this work, we designed and implemented a data streaming library MPIStream atop MPI to allocate data producers and consumers, to stream data continuously or irregularly and to process data at run-time. In the same spirit as the STREAM benchmark, we developed a parallel stream benchmark to measure data processing rate. The performance of the library largely depends on the size of the stream element, the number of data producers and consumers and the computational intensity of processing one stream element. With 2,048 data producers and 2,048 data consumers in the parallel benchmark, MPIStream achieved 200 GB/s processing rate on a Blue Gene/Q supercomputer. We illustrate that a streaming library for HPC applications can effectively enable irregular parallel I/O, application monitoring and threshold collective operations.

Original languageEnglish
Title of host publicationExaMPI '15 Proceedings of the 3rd Workshop on Exascale MPI
Pages2:1--2:10
Number of pages10
DOIs
Publication statusPublished - 15 Nov 2015

Fingerprint Dive into the research topics of 'A data streaming model in MPI'. Together they form a unique fingerprint.

Cite this