Issues in predicting protein function from sequence

Research output: Contribution to journalArticlepeer-review


Identifying homologues, defined as genes that arose from a common evolutionary ancestor, is often a relatively straightforward task, thanks to recent advances made in estimating the statistical significance of sequence similarities found from database searches. The extent by which homologues possess similarities in function, however, is less amenable to statistical analysis. Consequently, predicting function by homology is a qualitative, rather than quantitative, process and requires particular care to be taken. This review focuses on the various approaches that have been developed to predict function from the scale of the atom to that of the organism. Similarities in homologues' functions differ considerably at each of these different scales and also vary for different domain families. It is argued that due attention should be paid to all available clues to function, including orthologue identification, conservation of particular residue types, and the co-occurrence of domains in proteins. Pitfalls in database searching methods arising from amino acid compositional bias and database size effects are also discussed.

Original languageEnglish
Pages (from-to)19-29
Number of pages11
JournalBriefings in bioinformatics
Issue number1
Publication statusPublished - Mar 2001


  • Amino Acids
  • Animals
  • Computational Biology
  • Conserved Sequence
  • Databases, Factual
  • Evolution, Molecular
  • Gene Transfer, Horizontal
  • Humans
  • Protein Structure, Tertiary
  • Proteins
  • Sequence Alignment
  • Sequence Analysis, Protein


Dive into the research topics of 'Issues in predicting protein function from sequence'. Together they form a unique fingerprint.

Cite this