Conditional Dependencies: A Principled Approach to Improving Data Quality

Wenfei Fan, Floris Geerts, Xibei Jia

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Real-life data is often dirty and costs billions of pounds to businesses worldwide each year. This paper presents a promising approach to improving data quality. It effectively detects and fixes inconsistencies in real-life data based on conditional dependencies, an extension of database dependencies by enforcing bindings of semantically related data values. It accurately identifies records from unreliable data sources by leveraging relative candidate keys, an extension of keys for relations by supporting similarity and matching operators across relations. In contrast to traditional dependencies that were developed for improving the quality of schema, the revised constraints are proposed to improve the quality of data. These constraints yield practical techniques for data repairing and record matching in a uniform framework.
Original languageEnglish
Title of host publicationDataspace: The Final Frontier
Subtitle of host publication26th British National Conference on Databases, BNCOD 26, Birmingham, UK, July 7-9, 2009. Proceedings
PublisherSpringer Berlin Heidelberg
Number of pages13
ISBN (Electronic)978-3-642-02843-4
ISBN (Print)978-3-642-02842-7
Publication statusPublished - 2009


Dive into the research topics of 'Conditional Dependencies: A Principled Approach to Improving Data Quality'. Together they form a unique fingerprint.

Cite this