Machine Learning-Driven Phenotype Predictions based on Genome Annotations

Janaka N. Edirisinghe, Sunali Goyal,Alexander Brace,Ric Colasanti, Tianhao Gu, Boris Sadhkin,Qizhi Zhang, Roy T. Kamimura,Christopher S. Henry

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 1|浏览2
暂无评分
摘要
Abstract Over the past two decades, there has been a remarkable and exponential expansion in the availability of genome sequences, encompassing a vast number of isolate genomes, amounting to hundreds of thousands, and now extending to millions of metagenome-assembled genomes. The rapid and accurate interpretation of this data, along with the profiling of diverse phenotypes such as respiration type, antimicrobial resistance, or carbon utilization, is essential for a wide range of medical and research applications. Here, we leverage sequenced-based functional annotations obtained from the RAST annotation algorithm as predictors and employ six machine learning algorithms (K-Nearest Neighbors, Gaussian Naive Bayes, Support Vector Machines, Neural Networks, Logistic Regression, and Decision Trees) to generate classifiers that can accurately predict phenotypes of unclassified bacterial organisms. We apply this approach in two case studies focused on respiration types (aerobic, anaerobic, and facultative anaerobic) and Gram-stain types (Gram negative and Gram positive). We demonstrate that all six classifiers accurately classify the phenotypes of Gram stain and respiration type, and discuss the biological significance of the predicted outcomes. We also present four new applications that have been deployed in The Department of Energy Systems Biology Knowledgebase (KBase) that enable users to: (i) Upload high-quality data to train classifiers; (ii) Annotate genomes in the training set with the RAST annotation algorithm; (iii) Build six different genome classifiers; and (iv) Predict the phenotype of unclassified genomes. ( https://narrative.kbase.us/#catalog/modules/kb_genomeclassification )
更多
查看译文
关键词
phenotype predictions,genome annotations,learning-driven
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要