J(2)*: A New Method for Alignment-free Sequence Similarity Measurement

BIBM(2020)

引用 1|浏览1
暂无评分
摘要
Alignment-free sequence comparison methods can compute the similarity of a huge number of sequences much faster than traditional sequence alignment methods. Here, a new non-parametric alignment-free sequence comparison algorithm, J(2)* is proposed to measure sequence similarity based on the suffix tree data structure. Compared against other state-of-the-art alignment-free methods, namely, D-2(z), D-2, D-2(sh), D-2*, WFV, DV, Shi, CPF, DMk, K-2 and K-2*, J(2)* has three main advantages: (1) it has the fastest running time in theory and in practice. J(2)* reduces the time for k-words search from O(N-2) to O(N). Our experimental results confirm that it is the fastest among the 11 popular approaches. (2) J(2)* is easy to use: unlike the other alignment-free methods that often need to choose a suitable parameter k, there is no parameter selection for J(2)*. (3) J(2)* does not have any particular requirement for data distribution. Unlike the the parametric methods (such as the D-2-family) that require certain distribution for the data, J(2)* has no demand for a specific input distribution. The improved running time from J(2)* will be very useful in this era of big data, especially, with the increasing data volume of genome sequences.
更多
查看译文
关键词
sequence similarity measurement, alignment-free sequence comparison, suffix tree, biological sequences
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要