Predicting Host Association for Shiga Toxin-Producing E. coli Serogroups by Machine Learning.

SHIGA TOXIN-PRODUCING E. COLI: Methods and Protocols(2021)

引用 1|浏览8
暂无评分
摘要
Escherichia coli is a species of bacteria that can be present in a wide variety of mammalian hosts and potentially soil environments. E. coli has an open genome and can show considerable diversity in gene content between isolates. It is a reasonable assumption that gene content reflects evolution of strains in particular host environments and therefore can be used to predict the host most likely to be the source of an isolate. An extrapolation of this argument is that strains may also have gene content that favors success in multiple hosts and so the possibility of successful transmission from one host to another, for example, from cattle to human, can also be predicted based on gene content. In this methods chapter, we consider the issue of Shiga toxin (Stx)-producing E. coli (STEC) strains that are present in ruminants as the main host reservoir and for which we know that a subset causes life-threatening infections in humans. We show how the genome sequences of E. coli isolated from both cattle and humans can be used to build a classifier to predict human and cattle host association and how this can be applied to score key STEC serotypes known to be associated with human infection. With the example dataset used, serogroups O157, O26, and O111 show the highest, and O103 and O145 the lowest, predictions for human association. The long-term ambition is to combine such machine learning predictions with phylogeny to predict the zoonotic threat of an isolate based on its whole genome sequence (WGS).
更多
查看译文
关键词
Host attribution,Machine learning,STEC,Whole genome sequence (WGS),Zoonotic threat
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要