An Information-theoretic Approach to Normal Forms for Relational and XML Data

Marcelo Arenas, Leonid Libkin

Research output: Contribution to journalArticlepeer-review


Normalization as a way of producing good relational database designs is a well-understood topic. However, the same problem of distinguishing well-designed databases from poorly designed ones arises in other data models, in particular, XML. While, in the relational world, the criteria for being well designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models.Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information-theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information-theoretic criteria for justifying normalization algorithms.
Original languageEnglish
Pages (from-to)246-283
Number of pages38
JournalJournal of the ACM
Issue number2
Publication statusPublished - Mar 2005


Dive into the research topics of 'An Information-theoretic Approach to Normal Forms for Relational and XML Data'. Together they form a unique fingerprint.

Cite this