Why and Where: A Characterization of Data Provenance

Peter Buneman, Sanjeev Khanna, Wang-Chiew Tan

Research output: Chapter in Book/Report/Conference proceedingConference contribution


With the proliferation of database views and curated data- bases, the issue of data provenance - where a piece of data came from and the process by which it arrived in the database - is becoming increasingly important, especially in scientific databases where understanding provenance is crucial to the accuracy and currency of data. In this paper we describe an approach to computing provenance when the data of interest has been created by a database query. We adopt a syntactic approach and present results for a general data model that applies to relational databases as well as to hierarchical data such as XML. A novel aspect of our work is a distinction between "why" provenance (refers to the source data that had some influence on the existence of the data) and "where" provenance (refers to the location(s) in the source databases from which the data was extracted).
Original languageEnglish
Title of host publicationDatabase Theory — ICDT 2001
Subtitle of host publication8th International Conference London, UK, January 4–6, 2001 Proceedings
EditorsJan Van den Bussche, Victor Vianu
PublisherSpringer-Verlag GmbH
Number of pages15
ISBN (Electronic)978-3-540-44503-6
ISBN (Print)978-3-540-41456-8
Publication statusPublished - 2001

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Berlin / Heidelberg
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Dive into the research topics of 'Why and Where: A Characterization of Data Provenance'. Together they form a unique fingerprint.

Cite this