Predicting Host Association for Shiga Toxin-Producing E. coli Serogroups by Machine Learning

Nadejda Lupolova, Antonia Chalka, David L Gally

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)peer-review

Abstract / Description of output

Escherichia coli is a species of bacteria that can be present in a wide variety of mammalian hosts and potentially soil environments. E. coli has an open genome and can show considerable diversity in gene content between isolates. It is a reasonable assumption that gene content reflects evolution of strains in particular host environments and therefore can be used to predict the host most likely to be the source of an isolate. An extrapolation of this argument is that strains may also have gene content that favors success in multiple hosts and so the possibility of successful transmission from one host to another, for example, from cattle to human, can also be predicted based on gene content. In this methods chapter, we consider the issue of Shiga toxin (Stx)-producing E. coli (STEC) strains that are present in ruminants as the main host reservoir and for which we know that a subset causes life-threatening infections in humans. We show how the genome sequences of E. coli isolated from both cattle and humans can be used to build a classifier to predict human and cattle host association and how this can be applied to score key STEC serotypes known to be associated with human infection. With the example dataset used, serogroups O157, O26, and O111 show the highest, and O103 and O145 the lowest, predictions for human association. The long-term ambition is to combine such machine learning predictions with phylogeny to predict the zoonotic threat of an isolate based on its whole genome sequence (WGS).

Original languageEnglish
Title of host publicationShiga Toxin-Producing E. coli
EditorsJohn M. Walker, Stephanie Schuller, Martina Bielaszewska
Number of pages19
ISBN (Electronic) 978-1-0716-1339-9
Publication statusE-pub ahead of print - 12 Mar 2021

Publication series

NameMethods in molecular biology (Clifton, N.J.)
PublisherHumana Press
ISSN (Print)1064-3745

Keywords / Materials (for Non-textual outputs)

  • Machine learning
  • Host attribution
  • Zoonotic threat
  • STEC
  • Whole genome sequence (WGS)


Dive into the research topics of 'Predicting Host Association for Shiga Toxin-Producing E. coli Serogroups by Machine Learning'. Together they form a unique fingerprint.

Cite this