Edinburgh Research Explorer

Active provenance for Data-Intensive workflows: engaging users and developers

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions

Open

Documents

https://ieeexplore.ieee.org/document/9041815
Original languageEnglish
Title of host publication 2019 15th International Conference on eScience (eScience)
Place of PublicationSan Diego, CA, USA
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages560-569
Number of pages10
ISBN (Electronic)978-1-7281-2451-3
ISBN (Print)978-1-7281-2452-0
DOIs
Publication statusPublished - 19 Mar 2020
EventBridging from Concepts to Data and Computation for eScience (BC2DC’19) Workshop: A workshop co-located with the eScience 2019 International Conference - San Diego, California, United States
Duration: 24 Sep 201924 Sep 2019
https://bc2dc.github.io/

Workshop

WorkshopBridging from Concepts to Data and Computation for eScience (BC2DC’19) Workshop
Abbreviated titleBC2DC 2019
CountryUnited States
CitySan Diego, California
Period24/09/1924/09/19
Internet address

Abstract

We present a practical approach for provenance capturing in Data-Intensive workflow systems. It provides contextualisation by recording injected domain metadata with the provenance stream. It offers control over lineage precision, combining automation with specified adaptations. We address provenance tasks such as extraction of domain metadata, injection of custom annotations, accuracy and integration of records from multiple independent workflows running in distributed contexts. To allow such flexibility, we introduce the concepts of programmable Provenance Types and Provenance Configuration. Provenance Types handle domain contextualisation and allow developers to model lineage patterns by re-defining API methods, composing easy-to-use extensions. Provenance Configuration, instead, enables users of a Data-Intensive workflow execution to prepare it for provenance capture, by configuring the attribution of Provenance Types to components and by specifying grouping into semantic clusters. This enables better searches over the lineage records. Provenance Types and Provenance Configuration are demonstrated in a system being used by computational seismologists. It is based on an extended provenance model, S-PROV.

    Research areas

  • Reproducibility of results, Workflow management software, Metadata, Data flow computing, Collaborative work, Provenance, data-lineage, eScience, Data-intensive computing, workflow systems

Download statistics

No data available

ID: 105966253