Online parameterized dictionary matching with one gap

THEORETICAL COMPUTER SCIENCE(2020)

引用 4|浏览12
暂无评分
摘要
We study the online Parameterized Dictionary Matching with One Gap problem (PDMOG) which is the following. Preprocess a dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text T arriving online, a character at a time, we can report all the patterns from D that parameterized match to suffixes of the text that has arrived so far, before the next character arrives. Two equal-length strings are a parameterized match if there exists a bijection on the alphabets, such that one string matches the other under the bijection. The gap symbols are associated with bounds determining the possible lengths of matching strings. Online Dictionary Matching with One Gap (DMOG) captures the difficulty in a bottleneck procedure for cyber-security, as many digital signatures of viruses manifest themselves as patterns with a single gap. Parameterized match captures possible encryption of the patterns. We also define the strict PDMOG problem, in which subpatterns of the same dictionary pattern should be parameterized matched via the same bijection. This captures situations where subpatterns of a dictionary pattern are encoded simultaneously. We study this problem for special case called alphabet-saturated dictionairy, where every subpattern contains all characters of the dictionary alphabet Sigma. We use the following parameters to describe our results: D is the total size of the dictionary (not including the gaps), plsc is the longest parameterized suffix chain of subpatterns in D, op is the number of parameterized patterns occurrences in T, alpha* and beta* are the minimum left and maximum right gap borders in the non-uniformly bounded dictionary case, delta(G(D)) is the degeneracy of the graph G(D) representing dictionary D. This graph is classified as sparse or dense according the value of the delta(G(D)) and plsc parameters. We obtain: - (O) over tilde (D) preprocessing time/space and (O) over tilde(delta(G(D)) . plsc+plsc . max{vertical bar Sigma vertical bar, M} + op) query time per text character algorithm for online PDMOG with sparse graph dictionaries. - (O) over tilde (D + d(beta* - alpha*)) preprocessing time/space and (O) over tilde(root plsc . d . (beta* - alpha*) + plsc . max{vertical bar Sigma vertical bar, M} + op) query time per text character algorithm for online PDMOG with dense graph dictionaries. - (O) over tilde (D) preprocessing time/space and (O) over tilde(delta(G(D)) . plsc+op) query time per text character algorithm for strict PDMOG with alphabet-saturated dictionaries. These results are parallel to the ones obtained for the Dictionary with One Gap (DMOG) problem almost matching the lower bounds achieved for this problem [7]. While the parameter delta(G(D)) can be as large as root d and much lager if the dictionary has non-uniform gap boundaries, and the parameter plsc could theoretically be as large as d, in many practical situations these parameters are actually small. The strength of our work is in achieving results that explore and exploit small values for these parameters, thus supplying algorithms that are suitable for some practical cyber security needs. (C) 2020 Elsevier B.V. All rights reserved.
更多
查看译文
关键词
Pattern matching,Dictionary matching,Online dictionary matching with gaps,Parameterized matching
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要