Edinburgh Research Explorer

Active provenance for Data-Intensive workflows: engaging users and developers

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions



Original languageEnglish
Title of host publicationProceedings of the IEEE eScience 2019 proceedings
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages10
Publication statusAccepted/In press - 24 Jul 2019
EventBridging from Concepts to Data and Computation for eScience (BC2DC’19) Workshop: A workshop co-located with the eScience 2019 International Conference - San Diego, California, United States
Duration: 24 Sep 201924 Sep 2019


WorkshopBridging from Concepts to Data and Computation for eScience (BC2DC’19) Workshop
Abbreviated titleBC2DC 2019
CountryUnited States
CitySan Diego, California
Internet address


We present a practical approach for provenance capturing in Data-Intensive workflow systems that allows the contextualisation of the recorded properties to the domain of application, and the tuning of lineage precision, balancing between automation and ad-hoc adaptations.We address provenance tasks such as extraction of domain metadata, injection of custom annotations, accuracy and integration of records from multiple independent workflows running in distributed contexts. To allow such flexibility, we introduce the concepts of programmable Provenance Types and Provenance Configuration. Provenance Types handle domain contextualisation and allow developers to model lineage patterns through the implementation of the defining API methods, which include easy to use extensions. Provenance Configuration, instead, serves the users of a Data-Intensive application to prepare the workflow execution for provenance capturing, by configuring the attribution of Provenance Types to components and their grouping into semantic clusters, enabling better searches over the lineage. Provenance Types and Provenance Configuration are discussed in relation to a concrete system and a provenance model, S-PROV, and demonstrated through the effective adoption in a real application for seismic rapid assessment.

    Research areas

  • Reproducibility of results, Workflow management software, Metadata, Data flow computing, Collaborative work, Provenance, data-lineage, eScience, Data-intensive computing, workflow systems

Download statistics

No data available

ID: 105966253