Sequence similarity measures based on bounded hamming distance.

Alberto Apostolico,Concettina Guerra,Gad M. Landau,Cinzia Pizzi

Theor. Comput. Sci.（2016）

引用 26|浏览87

暂无评分

摘要

A growing number of measures of sequence similarity are being based on some underlying notion of relative compressibility. Within this paradigm, similar sequences are expected to share a large number of common substrings, or subsequences, or more complex patterns or motifs, and so on. In this paper, measures of sequence similarity are introduced and studied in which patterns in a pair are considered similar if they coincide up to a preset number of mismatches, that is, within a bounded Hamming distance. It is shown here that for some such measures bounds are achievable that are slightly better than O ( n 2 ) . Preliminary experiments demonstrate the potential applicability to phylogeny and classification of these similarity measures.

查看译文

关键词

Pattern matching,String comparison,Alignment free distances,Binary string,Longest common substring,Mismatches

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要