Binning Metagenomic Reads With Probabilistic Sequence Signatures Based On Spaced Seeds

2017 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB)(2017)

引用 3|浏览8
暂无评分
摘要
The growing number of sequencing projects in medicine and environmental sciences calls for the development of efficient approaches for the analysis of very large sets of metagenomic reads. Among the challenging tasks in metagenomics, the ability to agglomerate, or "bin" together, reads of the same species, without reference genomes, plays a crucial role in building a comprehensive description of relative abundances and diversity of the species in the sample. Recently, we have proposed an algorithm, called MetaProb, for metagenomic reads binning that reaches a precision that is currently unmatched. The competitive advantage of MetaProb depends on the use of probabilistic sequence signatures based on contiguous k-mers. In this work we explore the use of spaced seeds, rather than contiguous kmers, to build such signatures. The experimental results show that allowing mismatches in carefully chosen predefined positions leads to further benefits both in terms of improved accuracy and of reduction of the memory requirements. Availability: https://bitbucket.org/samu661/metaprob
更多
查看译文
关键词
probabilistic sequence signatures,spaced seeds,growing number,medicine,reference genomes,environmental sciences,MetaProb algorithm,metagenomic reads binning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要