Stochastic EM-based TFBS motif discovery with MITSU

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Motivation: The Expectation-Maximization (EM) algorithm has been successfully applied to the problem of transcription factor binding site (TFBS) motif discovery and underlies the most widely used motif discovery algorithms. In the wider field of probabilistic modelling, the stochastic EM (sEM) algorithm has been used to overcome some of the limitations of the EM algorithm; however, the application of sEM to motif discovery has not been fully explored.

Results: We present MITSU (Motif discovery by ITerative Sampling and Updating), a novel algorithm for motif discovery, which combines sEM with an improved approximation to the likelihood function, which is unconstrained with regard to the distribution of motif occurrences within the input dataset. The algorithm is evaluated quantitatively on realistic synthetic data and several collections of characterized prokaryotic TFBS motifs and shown to outperform EM and an alternative sEM-based algorithm, particularly in terms of site-level positive predictive value.

Original languageEnglish
Pages (from-to)310-318
Number of pages9
Issue number12
Publication statusPublished - 15 Jun 2014

Keywords / Materials (for Non-textual outputs)

  • factor-binding sites
  • escherichia-coli K-12
  • algorithm
  • sequences


Dive into the research topics of 'Stochastic EM-based TFBS motif discovery with MITSU'. Together they form a unique fingerprint.

Cite this