Suffix Array of Alignment: A Practical Index for Similar Data

SPIRE(2013)

引用 17|浏览9
暂无评分
摘要
The suffix tree of alignment is an index data structure for similar strings. Given an alignment of similar strings, it stores all suffixes of the alignment, called alignment-suffixes. An alignment-suffix represents one suffix of a string or suffixes of multiple strings starting at the same position in the alignment. The suffix tree of alignment makes good use of similarity in strings theoretically. However, suffix trees are not widely used in biological applications because of their huge space requirements, and instead suffix arrays are used in practice. In this paper we propose a space-economical version of the suffix tree of alignment, named the suffix array of alignment (SAA). Given an alignment ﾿ of similar strings, the SAA for ﾿ is a lexicographically sorted list of all the alignment-suffixes of ﾿. The SAA supports pattern search as efficiently as the generalized suffix array. Our experiments show that our index uses only 14% of the space used by the generalized suffix array to index 11 human genome sequences. The space efficiency of our index increases as the number of the genome sequences increases. We also present an efficient algorithm for constructing the SAA.
更多
查看译文
关键词
alignments,indexes for similar data,suffix arrays
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要