Edinburgh Research Explorer

Characterizing Data Provenance (Abstract)

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationAdvances in Databases
Subtitle of host publication17th British National Conference on Databases, BNCOD 17 Exeter, UK, July 3–5, 2000 Proceedings
EditorsBrian Lings, Keith Jeffery
PublisherSpringer Berlin Heidelberg
Pages171-171
Number of pages1
Volume1832
ISBN (Electronic)978-3-540-45033-7
ISBN (Print)978-3-540-67743-7
DOIs
Publication statusPublished - 2000

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Berlin Heidelberg
Volume1832

Abstract

When you see some data on the Web, do you ever wonder how it got there? The chances are that it is in no sense original, but was copied from some other source, which in turn was copied from some other source, and so on. If you are a scientist using a scientific database or some other kind of scholar using a digital library, you will probably be keenly interested in this information because it is crucial to your assessment of the accuracy and timeliness of the data. Data provenance is the understanding of the history of a piece of data: its origins and the process by which it travelled from database to database. Existing database tools give us little or no help in recording provenance; indeed database schemas make it difficult to record this kind of information. I shall report on some recent work that characterizes data provenance. It is based on a model for data, both structured and semistructured, which accounts for both the structure and location of data. Using this model, we can draw a distinction between “why provenance” and “where provenance”. The former expresses all the data in the source databases that contributed to the existence of the data of interest; the latter specifies the locations from which it was drawn. In particular, we can take a query in a generic semistructured query language and use it to provide a formal derivation of both forms of provenance and to derive a number of useful properties of these forms. The work generalizes existing work on relational databases that is limited to why provenance. This is a report of joint work with Sanjeev Khanna and WangChiew Tan.

ID: 16508677