TY - JOUR
T1 - Relative information completeness
AU - Fan, Wenfei
AU - Geerts, Floris
PY - 2010/11
Y1 - 2010/11
N2 - This article investigates the question of whether a partially closed database has complete information to answer a query. In practice an enterprise often maintains master data Dm, a closed-world database. We say that a database D is partially closed if it satisfies a set V of containment constraints of the form q(D) ⊆ p(Dm), where q is a query in a language LC and p is a projection query. The part of D not constrained by (Dm, V) is open, from which some tuples may be missing. The database D is said to be complete for a query Q relative to (Dm, V) if for all partially closed extensions D' of D, Q(D') = Q(D), i.e., adding tuples to D either violates some constraints in V or does not change the answer to Q.
We first show that the proposed model can also capture the consistency of data, in addition to its relative completeness. Indeed, integrity constraints studied for data consistency can be expressed as containment constraints. We then study two problems. One is to decide, given Dm, V, a query Q in a language LQ, and a partially closed database D, whether D is complete for Q relative to (Dm, V). The other is to determine, given Dm, V and Q, whether there exists a partially closed database that is complete for Q relative to (Dm, V). We establish matching lower and upper bounds on these problems for a variety of languages LQ and LC. We also provide characterizations for a database to be relatively complete, and for a query to allow a relatively complete database, when LQ and LC are conjunctive queries.
AB - This article investigates the question of whether a partially closed database has complete information to answer a query. In practice an enterprise often maintains master data Dm, a closed-world database. We say that a database D is partially closed if it satisfies a set V of containment constraints of the form q(D) ⊆ p(Dm), where q is a query in a language LC and p is a projection query. The part of D not constrained by (Dm, V) is open, from which some tuples may be missing. The database D is said to be complete for a query Q relative to (Dm, V) if for all partially closed extensions D' of D, Q(D') = Q(D), i.e., adding tuples to D either violates some constraints in V or does not change the answer to Q.
We first show that the proposed model can also capture the consistency of data, in addition to its relative completeness. Indeed, integrity constraints studied for data consistency can be expressed as containment constraints. We then study two problems. One is to decide, given Dm, V, a query Q in a language LQ, and a partially closed database D, whether D is complete for Q relative to (Dm, V). The other is to determine, given Dm, V and Q, whether there exists a partially closed database that is complete for Q relative to (Dm, V). We establish matching lower and upper bounds on these problems for a variety of languages LQ and LC. We also provide characterizations for a database to be relatively complete, and for a query to allow a relatively complete database, when LQ and LC are conjunctive queries.
KW - Incomplete information
KW - complexity
KW - master data management
KW - partially closed databases
KW - relative completeness
UR - http://www.scopus.com/inward/record.url?scp=78650642617&partnerID=8YFLogxK
U2 - 10.1145/1862919.1862924
DO - 10.1145/1862919.1862924
M3 - Article
VL - 35
SP - 1
EP - 44
JO - ACM Transactions on Database Systems
JF - ACM Transactions on Database Systems
SN - 0362-5915
IS - 4
M1 - 27
ER -