TY - GEN
T1 - Linking life sciences data using graph-based mapping
AU - Taubert, Jan
AU - Hindle, Matthew
AU - Lysenko, Artem
AU - Weile, Jochen
AU - Köhler, Jacob
AU - Rawlings, Christopher J.
PY - 2009/11/2
Y1 - 2009/11/2
N2 - There are over 1100 different databases available containing primary and derived data of interest to research biologists. It is inevitable that many of these databases contain overlapping, related or conflicting information. Data integration methods are being developed to address these issues by providing a consolidated view over multiple databases. However, a key challenge for data integration is the identification of links between closely related entries in different life sciences databases when there is no direct information that provides a reliable cross-reference. Here we describe and evaluate three data integration methods to address this challenge in the context of a graph-based data integration framework (the ONDEX system). A key result presented in this paper is a quantitative evaluation of their performance in two different situations: the integration and analysis of different metabolic pathways resources and the mapping of equivalent elements between the Gene Ontology and a nomenclature describing enzyme function.
AB - There are over 1100 different databases available containing primary and derived data of interest to research biologists. It is inevitable that many of these databases contain overlapping, related or conflicting information. Data integration methods are being developed to address these issues by providing a consolidated view over multiple databases. However, a key challenge for data integration is the identification of links between closely related entries in different life sciences databases when there is no direct information that provides a reliable cross-reference. Here we describe and evaluate three data integration methods to address this challenge in the context of a graph-based data integration framework (the ONDEX system). A key result presented in this paper is a quantitative evaluation of their performance in two different situations: the integration and analysis of different metabolic pathways resources and the mapping of equivalent elements between the Gene Ontology and a nomenclature describing enzyme function.
UR - http://www.scopus.com/inward/record.url?scp=70350356618&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-02879-3_3
DO - 10.1007/978-3-642-02879-3_3
M3 - Conference contribution
AN - SCOPUS:70350356618
SN - 3642028780
SN - 9783642028786
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 16
EP - 30
BT - Data Integration in the Life Sciences - 6th International Workshop, DILS 2009, Proceedings
T2 - 6th International Workshop on Data Integration in the Life Sciences, DILS 2009
Y2 - 20 July 2009 through 22 July 2009
ER -