A Database Index to Large Biological Sequences

Ela Hunt, Malcolm P. Atkinson, Robert W. Irving

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We present an approach to searching genetic DNA sequences using an adaptation of the suf- x tree data structure deployed on the general purpose persistent Java platform, PJama. Our implementation technique is novel, in that it allows us to build sux trees on disk for arbitrarily large sequences, for instance for the longest human chromosome consisting of 263 million letters. We propose to use such indexes as an alternative to the current practice of serial scanning. We describe our tree creation algorithm, analyse the performance of our index, and discuss the interplay of the data structure with ob ject store architectures. Early measurements are presented.
Original languageEnglish
Title of host publicationProceedings of the 27th International Conference on Very Large Data Bases
Place of PublicationSan Francisco, CA, USA
PublisherMorgan Kaufmann Publishers Inc.
Pages139-148
Number of pages10
ISBN (Print)1-55860-804-4
Publication statusPublished - 2001

Fingerprint

Dive into the research topics of 'A Database Index to Large Biological Sequences'. Together they form a unique fingerprint.

Cite this