XML design for relational storage

Solmaz Kolahi, Leonid Libkin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Design principles for XML schemas that eliminate redundancies and avoid update anomalies have been studied recently. Several normal forms, generalizing those for relational databases, have been proposed. All of them, however, are based on the assumption of anative XML storage, while in practice most of XML data is stored inrelational databases.

In this paper we study XML design and normalization for relational storage of XML documents. To be able to relate and compare XML and relational designs, we use an information-theoretic framework that measures information content in relations and documents, with higher values corresponding to lower levels of redundancy. We show that most common relational storage schemes preserve the notion of being well-designed (i.e., anomalies- and redundancy-free). Thus,existing XML normal forms guarantee well-designed relational storages as well. We further show that if this perfect option is not achievable, then a slight restriction on XML constraints guarantees a "second-best" relational design, according to possible values of the information-theoretic measure. We finally consider an edge-based relational representation of XML documents, and show that while it has similar information-theoretic properties with other relational representations, it can behave significantly worse in terms of enforcing integrity constraints.
Original languageEnglish
Title of host publicationProceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12, 2007
PublisherACM
Pages1083-1092
Number of pages10
ISBN (Print)78-1-59593-654-7
DOIs
Publication statusPublished - 8 May 2007

Fingerprint

Dive into the research topics of 'XML design for relational storage'. Together they form a unique fingerprint.

Cite this