BLOCK-DBSCAN: Fast clustering for large scale data

Pattern Recognition(2021)

引用 165|浏览66
暂无评分
摘要
We analyze the drawbacks of DBSCAN and its variants, and find the grid technique, which is used in Fast-DBSCAN and ρ-approximate DBSCAN, is almost useless in high dimensional data space. Because it usually yields considerable redundant distance computations. In order to tame these problems, two techniques are proposed: one is to use ϵ2-norm ball to identify Inner Core Blocks within which all points are core points, it has higher efficiency than grid technique for finding more core points at one time; the other is a fast approximate algorithm for judging whether two Inner Core Blocks are density-reachable from each other. Besides, cover tree is also used to accelerate the process of density computations. Based on the three techniques, an approximate approach, namely BLOCK-DBSCAN, is proposed for large scale data, which runs in about O(nlog (n)) expected time and obtains almost the same result as DBSCAN. BLOCK-DBSCAN has two versions, i.e., L2 version can work well for relatively high dimensional data, and L∞ version is suitable for high dimensional data. Experimental results show that BLOCK-DBSCAN is promising and outperforms NQDBSCAN, ρ-approximate DBSCAN and AnyDBC.
更多
查看译文
关键词
DBSCAN,ρ-approximate DBSCAN,BLOCK-DBSCAN,Core block
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要