A Filtering Approach for Alignment-Free Biosequences Comparison with Mismatches
ALGORITHMS IN BIOINFORMATICS (WABI 2015)(2015)
摘要
Alignment-free approaches for sequence similarity based on substring composition are increasingly attracting interest from the scientific community. In fact, in several contexts, with respect to alignment-based approaches, alignment-free techniques are faster but less accurate. Recently, several studies (e.g. [4,8,9]) attempted to bridge the accuracy gap with the introduction of approximate matches in the definition of composition-based distance measures. In this work we present MissMax, an exact algorithm for the computation of the longest common substring with mismatches between each suffix of a sequence x and a sequence y. This collection of statistics is useful for the computation of two similarity distances that have been recently extended to incorporate approximate matching, namely the longest and the average common substring with k mismatches. Our approach is exact, and it is based on a filtering technique that showed, in a set of preliminary experiments, to substantially reduce the size of the set of potential sites of a longest match.
更多查看译文
关键词
Suffix Tree, Suffix Array, Divergent Species, Naive Algorithm, Input Length
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络