Queries with Arithmetic on Incomplete Databases

Marco Console, Matthias Hofer, Leonid Libkin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases, relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies, we explain that it becomes much more problematic in situations when missing data occurs in numerical columns.

We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behavior of volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach.
Original languageEnglish
Title of host publicationProceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
Place of PublicationNew York, NY, USA
PublisherACM Association for Computing Machinery
Pages179–189
Number of pages11
ISBN (Print)9781450371087
DOIs
Publication statusPublished - 14 Jun 2020
Event2020 ACM SIGMOD/PODS International Conference on Management of Data - Portland, United States
Duration: 14 Jun 202019 Jun 2020
https://sigmod2020.org/

Conference

Conference2020 ACM SIGMOD/PODS International Conference on Management of Data
Abbreviated titleSIGMOD/PODS 2020
Country/TerritoryUnited States
CityPortland
Period14/06/2019/06/20
Internet address

Keywords

  • approximate query answers
  • asymptotic behavior
  • query answering
  • measure of certainty
  • missing data
  • first-order queries
  • numerical data
  • incomplete information

Fingerprint

Dive into the research topics of 'Queries with Arithmetic on Incomplete Databases'. Together they form a unique fingerprint.

Cite this