Information preserving XML schema embedding

Wenfei Fan, Philip Bohannon

Research output: Contribution to journalArticlepeer-review


A fundamental concern of data integration in an XML context is the ability to embed one or more source documents in a target document so that (a) the target document conforms to a target schema and (b) the information in the source documents is preserved. In this paper, information preservation for XML is formally studied, and the results of this study guide the definition of a novel notion of schema embedding between two XML DTD schemas represented as graphs. Schema embedding generalizes the conventional notion of graph similarity by allowing an edge in a source DTD schema to be mapped to a path in the target DTD. Instance-level embeddings can be derived from the schema embedding in a straightforward manner, such that conformance to a target schema and information preservation are guaranteed. We show that it is NP-complete to find an embedding between two DTD schemas. We also outline efficient heuristic algorithms to find candidate embeddings, which have proved effective by our experimental study. These yield the first systematic and effective approach to finding information preserving XML mappings.
Original languageEnglish
Number of pages44
JournalACM Transactions on Database Systems
Issue number1
Publication statusPublished - Mar 2008


Dive into the research topics of 'Information preserving XML schema embedding'. Together they form a unique fingerprint.

Cite this