谷歌浏览器插件
订阅小程序
在清言上使用

Approximate String Membership Checking: A Multiple Filter, Optimization-Based Approach

Data Engineering(2012)

引用 4|浏览1
暂无评分
摘要
We consider the approximate string membership checking (ASMC) problem of extracting all the strings or sub strings in a document that approximately match some string in a given dictionary. To solve this problem, the current state-of-art approach involves first applying an approximate, fast filter, then applying a more expensive exact verification algorithm to the strings that pass the filter. Correspondingly, many string filters have been proposed. We note that different filters are good at eliminating different strings, depending on the characteristics of the strings in both the documents and the dictionary. We suspect that no single filter will dominate all other filters everywhere. Given an ASMC problem instance and a set of string filters, we need to select the optimal filter to maximize the performance. Furthermore, in our experiments we found that in some cases a sequence of filters dominates any of the filters of the sequence in isolation, and that the best set of filters and their ordering depend upon the specific problem instance encountered. Accordingly, we propose that the approximate match problem be viewed as an optimization problem, and evaluate a number of techniques for solving this optimization problem.
更多
查看译文
关键词
optimization-based approach,asmc problem instance,approximate string membership checking,string filter,multiple filter,different filter,optimal filter,specific problem instance,approximate match problem,optimization problem,different string,single filter,approximation algorithms,optimization,formal verification,matched filters,string matching,estimation,dictionaries,matched filter,pipelines
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要