An Efficient Framework For Exact Set Similarity Search Using Tree Structure Indexes

2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017)(2017)

引用 48|浏览118
暂无评分
摘要
Similarity search is an essential operation in many applications. Given a collection of set records and a query, the exact set similarity search aims at finding all the records that are similar to the query from the collection. Existing methods adopt a filter-and-verify framework, which make use of inverted indexes. However, as the complexity of verification is rather low for set-based similarity metrics, they always fail to make a good tradeoff between filter power and filter cost. In this paper, we proposed an efficient framework for exact set similarity search based on tree index structure. We defined a hash-based ordering to effectively import data into the index structure and then make optimizations to reduce the filter cost. To further improve the filter power, we proposed a dynamic algorithm to partition the dataset into several parts and propose a multiple-index framework. Experimental results on real-world datasets show that our method significantly outperform the state-of-the-art algorithms.
更多
查看译文
关键词
exact set similarity search,tree structure indexes,query,filter-and-verify framework,inverted indexes,verification complexity,set-based similarity metrics,dynamic algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要