Projects per year
Abstract / Description of output
As HPC systems move into the Exascale era, parallel file systems are struggling to keep up with the I/O requirements from data-intensive problems. While the inclusion of burst buffers has helped to alleviate this by improving I/O performance, it has also increased the complexity of the I/O hierarchy by adding additional storage layers each with its own semantics. This forces users to explicitly manage data movement between the different storage layers, which, coupled with the lack of interfaces to communicate data dependencies between jobs in a data-driven workflow, prevents resource schedulers from optimizing these
transfers to benefit the cluster’s overall performance. This paper proposes several extensions to job schedulers, prototyped using the Slurm scheduling system, to enable users to appropriately express the data dependencies between the different phases in their processing workflows. It also introduces a new service for asynchronous data staging called NORNS that coordinates with the job scheduler to orchestrate data transfers to achieve better resource utilization. Our evaluation shows that a workflow-aware Slurm exploits node-local storage more effectively, reducing the filesystem I/O contention and improving job running times
transfers to benefit the cluster’s overall performance. This paper proposes several extensions to job schedulers, prototyped using the Slurm scheduling system, to enable users to appropriately express the data dependencies between the different phases in their processing workflows. It also introduces a new service for asynchronous data staging called NORNS that coordinates with the job scheduler to orchestrate data transfers to achieve better resource utilization. Our evaluation shows that a workflow-aware Slurm exploits node-local storage more effectively, reducing the filesystem I/O contention and improving job running times
Original language | English |
---|---|
Title of host publication | 2019 IEEE International Conference on Cluster Computing (CLUSTER) |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 1-12 |
Number of pages | 12 |
ISBN (Electronic) | 978-1-7281-4734-5 |
ISBN (Print) | 978-1-7281-4735-2 |
DOIs | |
Publication status | Published - 7 Nov 2019 |
Event | IEEE Cluster 2019 - Albuquerque, United States Duration: 23 Sept 2019 → 26 Sept 2019 https://clustercomp.org/2019/ |
Publication series
Name | |
---|---|
ISSN (Print) | 1552-5244 |
ISSN (Electronic) | 2168-9253 |
Conference
Conference | IEEE Cluster 2019 |
---|---|
Country/Territory | United States |
City | Albuquerque |
Period | 23/09/19 → 26/09/19 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- Scientific Workflows
- Burst Buffers
- High Performance Computing
- Data Staging
- In Situ Processing
Fingerprint
Dive into the research topics of 'NORNS: Extending Slurm to Support Data-Driven Workflows through Asynchronous Data Staging'. Together they form a unique fingerprint.Projects
- 1 Finished
Research output
- 1 Article
-
EPCC's Exascale journey: a retrospective of the past 10 years and a vision of the future
Weiland, M. & Parsons, M., 11 Oct 2021, (E-pub ahead of print) In: Computing in Science and Engineering. 24, 1, p. 8-13 6 p.Research output: Contribution to journal › Article › peer-review
Open AccessFile