An Efficient Framework For Exact Set Similarity Search Using Tree Structure Indexes
2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017)(2017)
摘要
Similarity search is an essential operation in many applications. Given a collection of set records and a query, the exact set similarity search aims at finding all the records that are similar to the query from the collection. Existing methods adopt a filter-and-verify framework, which make use of inverted indexes. However, as the complexity of verification is rather low for set-based similarity metrics, they always fail to make a good tradeoff between filter power and filter cost. In this paper, we proposed an efficient framework for exact set similarity search based on tree index structure. We defined a hash-based ordering to effectively import data into the index structure and then make optimizations to reduce the filter cost. To further improve the filter power, we proposed a dynamic algorithm to partition the dataset into several parts and propose a multiple-index framework. Experimental results on real-world datasets show that our method significantly outperform the state-of-the-art algorithms.
更多查看译文
关键词
exact set similarity search,tree structure indexes,query,filter-and-verify framework,inverted indexes,verification complexity,set-based similarity metrics,dynamic algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要