Data Quality: Theory and Practice

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Real-life data are often dirty: inconsistent, inaccurate, incomplete, stale and duplicated. Dirty data have been a longstanding issue, and the prevalent use of Internet has been increasing the risks, in an unprecedented scale, of creating and propagating dirty data. Dirty data are reported to cost US industry billions of dollars each year. There is no reason to believe that the scale of the problem is any different in any other society that depends on information technology. With these comes the need for improving data quality, a topic as important as traditional data management tasks for coping with the quantity of the data.

We aim to provide an overview of recent advances in the area of data quality, from theory to practical techniques. We promote a conditional dependency theory for capturing data inconsistencies, a new form of dynamic constraints for data deduplication, a theory of relative information completeness for characterizing incomplete data, and a data currency model for answering queries with current values from possibly stale data in the absence of reliable timestamps. We also discuss techniques for automatically discovering data quality rules, detecting errors in real-life data, and for correcting errors with performance guarantees.
Original languageEnglish
Title of host publicationWeb-Age Information Management
Subtitle of host publication13th International Conference, WAIM 2012, Harbin, China, August 18-20, 2012. Proceedings
PublisherSpringer
Pages1-16
Number of pages16
Volume7418
ISBN (Electronic)978-3-642-32281-5
ISBN (Print)978-3-642-32280-8
DOIs
Publication statusPublished - 2012

Fingerprint

Dive into the research topics of 'Data Quality: Theory and Practice'. Together they form a unique fingerprint.

Cite this