Abstract

Motivation: The Expectation-Maximization (EM) algorithm has been successfully applied to the problem of transcription factor binding site (TFBS) motif discovery and underlies the most widely used motif discovery algorithms. In the wider field of probabilistic modelling, the stochastic EM (sEM) algorithm has been used to overcome some of the limitations of the EM algorithm; however, the application of sEM to motif discovery has not been fully explored.

Results: We present MITSU (Motif discovery by ITerative Sampling and Updating), a novel algorithm for motif discovery, which combines sEM with an improved approximation to the likelihood function, which is unconstrained with regard to the distribution of motif occurrences within the input dataset. The algorithm is evaluated quantitatively on realistic synthetic data and several collections of characterized prokaryotic TFBS motifs and shown to outperform EM and an alternative sEM-based algorithm, particularly in terms of site-level positive predictive value.

Original languageEnglish
Pages (from-to)310-318
Number of pages9
JournalBioinformatics
Volume30
Issue number12
DOIs
Publication statusPublished - 15 Jun 2014

Keywords

  • factor-binding sites
  • escherichia-coli K-12
  • algorithm
  • sequences

Cite this