Information Preserving XML Schema Embedding

Philip Bohannon, Wenfei Fan, Michael Flaster, P. P. S. Narayan

Research output: Chapter in Book/Report/Conference proceedingConference contribution


A fundamental concern of information integration in an XML context is the ability to embed one or more source documents in a target document so that (a) the target document conforms to a target schema and (b) the information in the source document(s) is preserved. In this paper, information preservation for XML is formally studied, and the results of this study guide the definition of a novel notion of schema embedding between two XML DTD schemas represented as graphs. Schema embedding generalizes the conventional notion of graph similarity by allowing an edge in a source DTD schema to be mapped to a path in the target DTD. Instance-level embeddings can be defined from the schema embedding in a straightforward manner, such that conformance to a target schema and information preservation are guaranteed. We show that it is NP-complete to find an embedding between two DTD schemas. We also provide efficient heuristic algorithms to find candidate embeddings, along with experimental results to evaluate and compare the algorithms. These yield the
first systematic and effective approach to finding information preserving XML mappings.
Original languageEnglish
Title of host publicationProceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005
Number of pages12
Publication statusPublished - 2005


Dive into the research topics of 'Information Preserving XML Schema Embedding'. Together they form a unique fingerprint.

Cite this