Dictionary Matching with Uneven Gaps.

CPM(2015)

引用 26|浏览63
暂无评分
摘要
A gap-pattern is a sequence of sub-patterns separated by bounded sequences of don’t care characters (called gaps). A one-gap-pattern is a pattern of the form (P[alpha ,beta ]Q), where (P) and (Q) are strings drawn from alphabet (varSigma ) and ([alpha , beta ]) are lower and upper bounds on the gap size (g). The gap size (g) is the number of don’t care characters between (P) and (Q). The dictionary matching problem with one-gap is to index a collection of one-gap-patterns, so as to identify all sub-strings of a query text (T) that match with any one-gap-pattern the collection. Let ({mathcal D}) be such a collection of (d) patterns, where ({mathcal D}={P_i[alpha _i,beta _i]Q_imid 1le i le d}). Let (n=sum _{i=1}^d|P_i|+|Q_i|). Let (gamma ) and (lambda ) be two parameters defined on ({mathcal D}) as follows: (gamma = |{jmid j in [alpha _i,beta _i], 1le ile d}|) and (lambda = |{alpha _i,beta _i mid 1le ile d}|). Specifically (gamma ) is the total number gap lengths possible over all patterns ({mathcal D}) and (lambda ) is the number of distinct gap boundaries across all the patterns. We present a linear space solution (i.e., (O(n)) words) for answering a dictionary matching query on ({mathcal D}) time (O(|T| gamma log lambda log d+occ)), where (occ) is the output size. The query time can be improved to (O(|T|gamma +occ)) using (O(n+d^{1+epsilon })) space, where (epsilon u003e0) is an arbitrarily small constant. Additionally, we show a compact/succinct space index offering a space-time trade-off. In the special case where parameters (alpha _i) and (beta _i)’s for all the patterns are same, our results improve upon the work by Amir et al. [CPM, 2014]. We also explore several related cases where gaps can occur at arbitrary locations and where gap can be induced the text rather than pattern.
更多
查看译文
关键词
Dictionary matching, Point enclosure queries
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要