annot8r: GO, EC and KEGG annotation of EST datasets

R. Schmid, M. L. Blaxter

Research output: Contribution to journalArticlepeer-review

Abstract

BACKGROUND: EST sequencing is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups which do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user friendly tools to facilitate the annotation of non-model species EST datasets with well defined categories that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways.
RESULTS: annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated GO, EC and KEGG annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations are retrieved from the reference database. The annotations are saved as flat files, and also stored in a relational postgreSQL database to allow for more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools
CONCLUSIONS: annot8r allows the rapid assignment of GO, EC and KEGG annotations for data sets resulting from EST sequencing projects. The benefits of a relational database, flexibility and the ease of use of the program make it an ideally suited annotation tool for non-model species EST-sequencing projects.
Original languageEnglish
Article number180
Pages (from-to)-
Number of pages6
JournalBMC Bioinformatics
Volume9
Issue number180
DOIs
Publication statusPublished - 2008

Keywords

  • Base Sequence
  • Chromosome Mapping
  • Database Management Systems
  • Databases, Genetic
  • Documentation
  • Expressed Sequence Tags
  • Information Storage and Retrieval
  • Molecular Sequence Data
  • Sequence Analysis, DNA

Fingerprint

Dive into the research topics of 'annot8r: GO, EC and KEGG annotation of EST datasets'. Together they form a unique fingerprint.

Cite this