Project Details
Key findings
This is a follow-on of project EP/E029213/1
on data quality. It aims to develop a practical system for
improving data quality. The key finding is SemanDaq, a working
data cleaning system based on conditional functional dependencies
and matching dependencies. It supports the following:
(1) Data quality rule discovery: automatically discovering
conditional dependencies as data quality rules from (possibly
dirty) data.
(2) Rule validation: automatically validating the rules discovered.
(3) Error detection: detecting errors and inconsistencies in the data.
(4) Data repairing: fixing the errors detected, with performance
guarantees on the quality of repairs.
(5) Entity resolution: identifying tuples from unreliable data sources
that refer to the same real-world entity, based on the semantics of
the data.
The system was demonstrated at VLDB 2008, and was well received.
on data quality. It aims to develop a practical system for
improving data quality. The key finding is SemanDaq, a working
data cleaning system based on conditional functional dependencies
and matching dependencies. It supports the following:
(1) Data quality rule discovery: automatically discovering
conditional dependencies as data quality rules from (possibly
dirty) data.
(2) Rule validation: automatically validating the rules discovered.
(3) Error detection: detecting errors and inconsistencies in the data.
(4) Data repairing: fixing the errors detected, with performance
guarantees on the quality of repairs.
(5) Entity resolution: identifying tuples from unreliable data sources
that refer to the same real-world entity, based on the semantics of
the data.
The system was demonstrated at VLDB 2008, and was well received.
Status | Finished |
---|---|
Effective start/end date | 1/10/09 → 30/09/10 |
Funding
- EPSRC: £127,730.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.