Projects per year
Abstract / Description of output
Data integration is a classical problem in databases, typically decomposed into schema matching, entity matching and record merging. To solve the latter, it is mostly assumed that ground truth can be determined, either as master data or from user feedback. However, in many cases, this is not the case because firstly the merging processes cannot be accurate enough, and also the data gathering processes in the different sources are simply imperfect and cannot provide high quality data. Instead of enforcing consistency, we propose to evaluate how concordant or discordant sources are as a measure of trustworthiness (the more discordant are the sources, the less we can trust their data). Thus, we define the discord measurement problem in which given a set of uncertain raw observations or aggregate results (such as case/hospitalization/death data relevant to COVID-19) and information on the alignment of different data (for example, cases and deaths), we wish to assess whether the different sources are concordant, or if not, measure how discordant they are.
Original language | English |
---|---|
Title of host publication | 24th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, DOLAP 2022 |
Editors | Kostas Stefanidis, Lukasz Golab |
Publisher | CEUR Workshop Proceedings |
Pages | 96-100 |
Number of pages | 5 |
Volume | 3130 |
Publication status | Published - 25 Apr 2022 |
Event | 24th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data - Edinburgh, United Kingdom Duration: 29 Mar 2022 → 29 Mar 2022 Conference number: 24 https://sites.google.com/view/dolap2022/home |
Publication series
Name | CEUR Workshop Proceedings |
---|---|
Publisher | CEUR-WS |
ISSN (Print) | 1613-0073 |
Conference
Conference | 24th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data |
---|---|
Abbreviated title | DOLAP 2022 |
Country/Territory | United Kingdom |
City | Edinburgh |
Period | 29/03/22 → 29/03/22 |
Internet address |
Fingerprint
Dive into the research topics of 'Measuring discord among multidimensional data sources'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Skye-A programming language bridging theory and practice for scientific data curation
1/09/16 → 28/02/23
Project: Research