Edinburgh Research Explorer

A Graph Model of Data and Workflow Provenance

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationProceedings of the 2nd Conference on Theory and Practice of Provenance
Place of PublicationBerkeley, CA, USA
PublisherUSENIX Association
Pages8-8
Number of pages1
Publication statusPublished - 2010

Publication series

NameTAPP'10
PublisherUSENIX Association

Abstract

Provenance has been studied extensively in both database and workflow management systems, so far with little convergence of definitions or models. Provenance in databases has generally been defined for relational or complex object data, by propagating fine-grained annotations or algebraic expressions from the input to the output. This kind of provenance has been found useful in other areas of computer science: annotation databases, probabilistic databases, schema and data integration, etc. In contrast, workflow provenance aims to capture a complete description of evaluation - or enactment - of a workflow, and this is crucial to verification in scientific computation. Workflows and their provenance are often presented using graphical notation, making them easy to visualize but complicating the formal semantics that relates their run-time behavior with their provenance records. We bridge this gap by extending a previously-developed dataflow language which supports both database-style querying and workflow-style batch processing steps to produce a workflow-style provenance graph that can be explicitly queried. We define and describe the model through examples, present queries that extract other forms of provenance, and give an executable definition of the graph semantics of dataflow expressions.

ID: 16501754