J(2)*: A New Method for Alignment-free Sequence Similarity Measurement

Yue Jiang,Donald A. Adjeroh,Bing-Hua Jiang,Jie Lin

BIBM（2020）

引用 1|浏览1

暂无评分

摘要

Alignment-free sequence comparison methods can compute the similarity of a huge number of sequences much faster than traditional sequence alignment methods. Here, a new non-parametric alignment-free sequence comparison algorithm, J(2)* is proposed to measure sequence similarity based on the suffix tree data structure. Compared against other state-of-the-art alignment-free methods, namely, D-2(z), D-2, D-2(sh), D-2*, WFV, DV, Shi, CPF, DMk, K-2 and K-2*, J(2)* has three main advantages: (1) it has the fastest running time in theory and in practice. J(2)* reduces the time for k-words search from O(N-2) to O(N). Our experimental results confirm that it is the fastest among the 11 popular approaches. (2) J(2)* is easy to use: unlike the other alignment-free methods that often need to choose a suitable parameter k, there is no parameter selection for J(2)*. (3) J(2)* does not have any particular requirement for data distribution. Unlike the the parametric methods (such as the D-2-family) that require certain distribution for the data, J(2)* has no demand for a specific input distribution. The improved running time from J(2)* will be very useful in this era of big data, especially, with the increasing data volume of genome sequences.

查看译文

关键词

sequence similarity measurement, alignment-free sequence comparison, suffix tree, biological sequences

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要