Chrome Extension
WeChat Mini Program
Use on ChatGLM

GenomeFace: a deep learning-based metagenome binner trained on 43,000 microbial genomes

biorxiv(2024)

Cited 0|Views22
No score
Abstract
Metagenomic binning, the process of grouping DNA sequences into taxonomic units, is critical for understanding the functions, interactions, and evolutionary dynamics of microbial communities. We propose a deep learning approach to binning using two neural networks, one based on composition and another on environmental abundance, dynamically weighting the contribution of each based on characteristics of the input data. Trained on over 43,000 prokaryotic genomes, our network for composition-based binning is inspired by metric learning techniques used for facial recognition. Using a task-specific, multi-GPU accelerated algorithm to cluster the embeddings produced by our network, our binner leverages marker genes observed to be universally present in nearly all taxa to grade and select optimal clusters of sequences from a hierarchy of candidates. We evaluate our approach on four simulated datasets with known ground truth. Our linear time integration of marker genes recovers more near complete genomes than state of the art but computationally infeasible solutions using them, while being over an order of magnitude faster. Finally, we demonstrate the scalability and acuity of our approach by testing it on three of the largest metagenome assemblies ever performed. Compared to other binners, we produced 47%-183% more near complete genomes. From these datasets, we find over the genomes of over 3000 new candidate species which have never been previously cataloged, representing a potential 4% expansion of the known bacterial tree of life. ### Competing Interest Statement The authors have declared no competing interest.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined