BioKleisli: Integrating Biomedical Data and Analysis Packages

Susan Davidson, Peter Buneman, Jonathan Crabtree, Val Tannen, L. Wong

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)peer-review


Data of interest to biomedical researchers associated with the Human Genome Project (HGP) is stored all over the world in a variety of electronic data formats and accessible through a variety of interfaces and retrieval languages. These data sources include conventional relational databases with SQL interfaces, formatted text files on top of which indexing is provided for efficient retrieval (ASN.1) and binary files that can be interpreted textually or graphically via structures. Researchers within the HGP want to combine data from these different data sources, add value through sophisticated data analysis techniques (such as the biosequence comparison software BLAST and FASTA), and view it using special purpose scientific visualization tools.

However, currently there are no commercial tools for enabling such an integrated digital library, and a fundamental barrier to developing such tools appears to be one of language design and optimization. For example, while tools exist for interoperating between heterogeneous relational databases, the data formats and software packages found throughout HGP contain a number of data types not easily available in conventional databases, such as lists, variants and arrays; furthermore, these types may be deeply nested. We present in this paper a language for querying and transforming data from heterogeneous sources, discuss its implementation in a system called BioKleisli and illustrate its use in accessing data sources critical to HGP.
Original languageEnglish
Title of host publicationBioinformatics
Subtitle of host publicationDatabases and Systems
EditorsS. Letovsky
PublisherKluwer Academic Publishers
Number of pages11
ISBN (Print)0-7923-8573-X
Publication statusPublished - 1998


Dive into the research topics of 'BioKleisli: Integrating Biomedical Data and Analysis Packages'. Together they form a unique fingerprint.

Cite this