A Graph-Based Approach for Detecting Sequence Homology in Highly Diverged Repeat Protein Families

Jonathan N. Wells*, Joseph A. Marsh

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract / Description of output

Reconstructing evolutionary relationships in repeat proteins is notoriously difficult due to the high degree of sequence divergence that typically occurs between duplicated repeats. This is complicated further by the fact that proteins with a large number of similar repeats are more likely to produce significant local sequence alignments than proteins with fewer copies of the repeat motif. Furthermore, biologically correct sequence alignments are sometimes impossible to achieve in cases where insertion or translocation events disrupt the order of repeats in one of the sequences being aligned. Combined, these attributes make traditional phylogenetic methods for studying protein families unreliable for repeat proteins, due to the dependence of such methods on accurate sequence alignment. We present here a practical solution to this problem, making use of graph clustering combined with the open-source software package HH-suite, which enables highly sensitive detection of sequence relationships. Carrying out multiple rounds of homology searches via alignment of profile hidden Markov models, large sets of related proteins are generated. By representing the relationships between proteins in these sets as graphs, subsequent clustering with the Markov cluster algorithm enables robust detection of repeat protein subfamilies.

Original languageEnglish
Title of host publicationMethods in Molecular Biology
Number of pages11
Publication statusPublished - 1 Jan 2019

Publication series

NameMethods in Molecular Biology
ISSN (Print)1064-3745

Keywords / Materials (for Non-textual outputs)

  • Evolution
  • Graph clustering
  • Profile-HMM alignment
  • Protein families
  • Repeat proteins
  • Sequence homology


Dive into the research topics of 'A Graph-Based Approach for Detecting Sequence Homology in Highly Diverged Repeat Protein Families'. Together they form a unique fingerprint.

Cite this