Edinburgh Research Explorer

Quaid - a platform for improving data quality

Project: Research

Effective start/end date1/10/0930/09/10
Total award£127,730.00
Funding organisationEPSRC
Funder project referenceEP/H008063/1

Key findings

This is a follow-on of project EP/E029213/1
on data quality. It aims to develop a practical system for
improving data quality. The key finding is SemanDaq, a working
data cleaning system based on conditional functional dependencies
and matching dependencies. It supports the following:
(1) Data quality rule discovery: automatically discovering
conditional dependencies as data quality rules from (possibly
dirty) data.
(2) Rule validation: automatically validating the rules discovered.
(3) Error detection: detecting errors and inconsistencies in the data.
(4) Data repairing: fixing the errors detected, with performance
guarantees on the quality of repairs.
(5) Entity resolution: identifying tuples from unreliable data sources
that refer to the same real-world entity, based on the semantics of
the data.
The system was demonstrated at VLDB 2008, and was well received.