TY - GEN

T1 - Query Optimization for Semistructured Data using Path Constraints in a Deterministic Data Model

AU - Buneman, Peter

AU - Fan, Wenfei

AU - Weinstein, Scott

PY - 1999

Y1 - 1999

N2 - Path constraints have been studied for semistructured data modeled as a rooted edge-labeled directed graph [4], [11]-[13]. In this model, the implication problems associated with many natural path constraints are undecidable [11], [13]. A variant of the graph model, called the deter- ministic data model, was recently proposed in [10]. In this model, data is represented as a graph with deterministic edge relations, i.e., the edges emanating from any node in the graph have distinct labels. This model is more appropriate for representing, e.g., ACeDB [27] databases and Web sites. This paper investigates path constraints for the deterministic data model. It demonstrates the application of path constraints to, among others, query optimization. Three classes of path constraints are considered: the language Pc introduced in [11], an extension of Pc , denoted by Pcw , by including wildcards in path expressions, and a generalization of Pcw , denoted by Pc* , by representing paths as regular expressions. The implication problems for these constraint languages are studied in the context of the deterministic data model. It is shown that in contrast to the undecidability result of [11], the implication and finite implication problems for Pc are decidable in cubic-time and are finitely axiomatizable. Moreover, the implication problems are decidable for Pcw . However, the implication problems for Pc* are undecidable.

AB - Path constraints have been studied for semistructured data modeled as a rooted edge-labeled directed graph [4], [11]-[13]. In this model, the implication problems associated with many natural path constraints are undecidable [11], [13]. A variant of the graph model, called the deter- ministic data model, was recently proposed in [10]. In this model, data is represented as a graph with deterministic edge relations, i.e., the edges emanating from any node in the graph have distinct labels. This model is more appropriate for representing, e.g., ACeDB [27] databases and Web sites. This paper investigates path constraints for the deterministic data model. It demonstrates the application of path constraints to, among others, query optimization. Three classes of path constraints are considered: the language Pc introduced in [11], an extension of Pc , denoted by Pcw , by including wildcards in path expressions, and a generalization of Pcw , denoted by Pc* , by representing paths as regular expressions. The implication problems for these constraint languages are studied in the context of the deterministic data model. It is shown that in contrast to the undecidability result of [11], the implication and finite implication problems for Pc are decidable in cubic-time and are finitely axiomatizable. Moreover, the implication problems are decidable for Pcw . However, the implication problems for Pc* are undecidable.

U2 - 10.1007/3-540-44543-9_13

DO - 10.1007/3-540-44543-9_13

M3 - Conference contribution

SN - 978-3-540-41481-0

T3 - Lecture Notes in Computer Science

SP - 208

EP - 223

BT - Research Issues in Structured and Semistructured Database Programming

PB - Springer-Verlag GmbH

ER -