SPM4GAC: SPM based approach for genome analysis and classification of macromolecules

International Journal of Biological Macromolecules(2024)

引用 0|浏览0
暂无评分
摘要
Genome sequence analysis and classification play critical roles in properly understanding an organism's main characteristics, functionalities, and changing (evolving) nature. However, the rapid expansion of genomic data makes genome sequence analysis and classification a challenging task due to the high computational requirements, proper management, and understanding of genomic data. Recently proposed models yielded promising results for the task of genome sequence classification. Nevertheless, these models often ignore the sequential nature of nucleotides, which is crucial for revealing their underlying structure and function. To address this limitation, we present SPM4GAC, a sequential pattern mining (SPM)-based framework to analyze and classify the macromolecule genome sequences of viruses. First, a large dataset containing the genome sequences of various RNA viruses is developed and transformed into a suitable format. On the transformed dataset, algorithms for SPM are used to identify frequent sequential patterns of nucleotide bases. The obtained frequent sequential patterns of bases are then used as features to classify different viruses. Ten classifiers are employed, and their performance is assessed by using several evaluation measures. Finally, a performance comparison of SPM4GAC with state-of-the-art methods for genome sequence classification/detection reveals that SPM4GAC performs better than those methods.
更多
查看译文
关键词
Genomes,RNA virus,Classification,Sequential pattern mining,Nucleotides
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要