NORNS: Extending Slurm to Support Data-Driven Workflows through Asynchronous Data Staging

Alberto Miranda, William Jackson, Tommaso Tocci, Iakovos Panourgias, Ramon Nou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As HPC systems move into the Exascale era, parallel file systems are struggling to keep up with the I/O requirements from data-intensive problems. While the inclusion of burst buffers has helped to alleviate this by improving I/O performance, it has also increased the complexity of the I/O hierarchy by adding additional storage layers each with its own semantics. This forces users to explicitly manage data movement between the different storage layers, which, coupled with the lack of interfaces to communicate data dependencies between jobs in a data-driven workflow, prevents resource schedulers from optimizing these
transfers to benefit the cluster’s overall performance. This paper proposes several extensions to job schedulers, prototyped using the Slurm scheduling system, to enable users to appropriately express the data dependencies between the different phases in their processing workflows. It also introduces a new service for asynchronous data staging called NORNS that coordinates with the job scheduler to orchestrate data transfers to achieve better resource utilization. Our evaluation shows that a workflow-aware Slurm exploits node-local storage more effectively, reducing the filesystem I/O contention and improving job running times
Original languageEnglish
Title of host publication2019 IEEE International Conference on Cluster Computing (CLUSTER)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages1-12
Number of pages12
ISBN (Electronic) 978-1-7281-4734-5
ISBN (Print)978-1-7281-4735-2
DOIs
Publication statusPublished - 7 Nov 2019
EventIEEE Cluster 2019 - Albuquerque, United States
Duration: 23 Sep 201926 Sep 2019
https://clustercomp.org/2019/

Publication series

Name
ISSN (Print)1552-5244
ISSN (Electronic)2168-9253

Conference

ConferenceIEEE Cluster 2019
Country/TerritoryUnited States
CityAlbuquerque
Period23/09/1926/09/19
Internet address

Keywords

  • Scientific Workflows
  • Burst Buffers
  • High Performance Computing
  • Data Staging
  • In Situ Processing

Fingerprint

Dive into the research topics of 'NORNS: Extending Slurm to Support Data-Driven Workflows through Asynchronous Data Staging'. Together they form a unique fingerprint.

Cite this