An information-theoretic analysis of worst-case redundancy in database design

Solmaz Kolahi, Leonid Libkin

Research output: Contribution to journalArticlepeer-review

Abstract

Normal forms that guide the process of database schema design have several key goals such as elimination of redundancies and preservation of integrity constraints, such as functional dependencies. It has long been known that complete elimination of redundancies and complete preservation of constraints cannot be achieved simultaneously. In this article, we use a recently introduced information-theoretic framework, and provide a quantitative analysis of the redundancy/integrity preservation trade-off, and give techniques for comparing different schema designs in terms of the amount of redundancy they carry.

The main notion of the information-theoretic framework is that of an information content of each datum in an instance (which is a number in [0, 1]): the closer to 1, the less redundancy it carries. We start by providing a combinatorial criterion that lets us calculate, for a relational schema with functional dependencies, the lowest information content in its instances. This indicates how good the schema design is in terms of allowing redundant information. We then study the normal form 3NF, which tolerates some redundancy to guarantee preservation of functional dependencies. The main result provides a formal justification for normal form 3NF by showing that this normal form pays the smallest possible price, in terms of redundancy, for achieving dependency preservation. We also give techniques for quantitative comparison of different normal forms based on the redundancy they tolerate.
Original languageEnglish
Article number5
Pages (from-to)1-32
Number of pages31
JournalACM Transactions on Database Systems
Volume35
Issue number1
DOIs
Publication statusPublished - 1 Feb 2010

Keywords

  • Database design
  • Third Normal Form (3NF)
  • functional dependency
  • redundancy

Fingerprint Dive into the research topics of 'An information-theoretic analysis of worst-case redundancy in database design'. Together they form a unique fingerprint.

Cite this