Entropy Based Clustering of Viral Sequences

Bioinformatics Research and Applications(2023)

引用 1|浏览19
暂无评分
摘要
Clustering viral sequences allows us to characterize the composition and structure of intrahost and interhost viral populations, which play a crucial role in disease progression and epidemic spread. In this paper we propose and validate a new entropy based method for clustering aligned viral sequences considered as categorical data. The method finds a homogeneous clustering by minimizing information entropy rather than distance between sequences in the same cluster. We have applied our entropy based clustering method to SARS-CoV-2 viral sequencing data. We report the information content extracted from the sequences by entropy based clustering. Our method converges to similar minimum-entropy clusterings across different runs and limited permutations of data. We also show that a parallelized version of our tool is scalable to very large SARS-CoV-2 datasets.
更多
查看译文
关键词
Categorical data, Clustering, Entropy, Monte Carlo algorithm, Viral genomic sequences
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要