A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome

biorxiv(2019)

引用 4|浏览16
暂无评分
摘要
Classifying taxa, including those that have not previously been identified, is a key task in characterizing the microbial communities of under-described habitats, including permanently ice-covered lakes in the dry valleys of the Antarctic. Current supervised phylogeny-based methods fall short on recognizing species assembled from metagenomic datasets from such habitats, as they are often incomplete or lack closely known relatives. Here, we report an efficient software suite, “Genome Constellation”, that is capable of rapidly characterizing a large number of metagenome-assembled genomes. Genome Constellation estimates similarities between genomes based on their k-mer matches, and subsequently uses these similarities for classification, clustering, and visualization. The clusters of reference genomes formed by Genome Constellation closely resemble known phylogenetic relationships while simultaneously revealing unexpected connections. In a dataset containing 1,693 draft genomes assembled from the Antarctic lake communities where only 40% could be placed in a phylogenetic tree, Genome Constellation improves taxa assignment to 61%. The clustering-based analysis revealed several novel taxa groups, including six clusters that may represent new bacterial phyla. Remarkably, we discovered 63 new giant viruses, 3 of which could not be found by using the traditional marker-based approach. In summary, we demonstrate that Genome Constellation provides an unbiased option to rapidly analyze a large number of microbial genomes and visually explore their relatedness. The software is available under BSD license at: .
更多
查看译文
关键词
Taxonomy classification,metagenome assembled genomes,metagenome visualization,Genome Constellation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要