Edinburgh Research Explorer

Information preserving XML schema embedding

Research output: Contribution to journalArticle

Related Edinburgh Organisations


Original languageEnglish
Number of pages44
JournalACM Transactions on Database Systems
Issue number1
Publication statusPublished - Mar 2008


A fundamental concern of data integration in an XML context is the ability to embed one or more source documents in a target document so that (a) the target document conforms to a target schema and (b) the information in the source documents is preserved. In this paper, information preservation for XML is formally studied, and the results of this study guide the definition of a novel notion of schema embedding between two XML DTD schemas represented as graphs. Schema embedding generalizes the conventional notion of graph similarity by allowing an edge in a source DTD schema to be mapped to a path in the target DTD. Instance-level embeddings can be derived from the schema embedding in a straightforward manner, such that conformance to a target schema and information preservation are guaranteed. We show that it is NP-complete to find an embedding between two DTD schemas. We also outline efficient heuristic algorithms to find candidate embeddings, which have proved effective by our experimental study. These yield the first systematic and effective approach to finding information preserving XML mappings.

Download statistics

No data available

ID: 17663425