Data streaming model is an effective way to tackle the challenge of data-intensive applications. As traditional HPC applications generate large volume of data and more data-intensive applications move to HPC infrastructures, it is necessary to investigate the feasibility of combining message-passing and streaming programming models. MPI, the de facto standard for programming on HPC, cannot intuitively express the communication pattern and the functional operations required in streaming models. In this work, we designed and implemented a data streaming library MPIStream atop MPI to allocate data producers and consumers, to stream data continuously or irregularly and to process data at run-time. In the same spirit as the STREAM benchmark, we developed a parallel stream benchmark to measure data processing rate. The performance of the library largely depends on the size of the stream element, the number of data producers and consumers and the computational intensity of processing one stream element. With 2,048 data producers and 2,048 data consumers in the parallel benchmark, MPIStream achieved 200 GB/s processing rate on a Blue Gene/Q supercomputer. We illustrate that a streaming library for HPC applications can effectively enable irregular parallel I/O, application monitoring and threshold collective operations.
|Title of host publication||ExaMPI '15 Proceedings of the 3rd Workshop on Exascale MPI|
|Number of pages||10|
|Publication status||Published - 15 Nov 2015|