BLOCK-DBSCAN: Fast clustering for large scale data

Yewang Chen,Lida Zhou,Nizar Bouguila,Cheng Wang,Yi Chen,Jixiang Du

Pattern Recognition（2021）

引用 165|浏览66

暂无评分

摘要

We analyze the drawbacks of DBSCAN and its variants, and find the grid technique, which is used in Fast-DBSCAN and ρ-approximate DBSCAN, is almost useless in high dimensional data space. Because it usually yields considerable redundant distance computations. In order to tame these problems, two techniques are proposed: one is to use ϵ2-norm ball to identify Inner Core Blocks within which all points are core points, it has higher efficiency than grid technique for finding more core points at one time; the other is a fast approximate algorithm for judging whether two Inner Core Blocks are density-reachable from each other. Besides, cover tree is also used to accelerate the process of density computations. Based on the three techniques, an approximate approach, namely BLOCK-DBSCAN, is proposed for large scale data, which runs in about O(nlog (n)) expected time and obtains almost the same result as DBSCAN. BLOCK-DBSCAN has two versions, i.e., L2 version can work well for relatively high dimensional data, and L∞ version is suitable for high dimensional data. Experimental results show that BLOCK-DBSCAN is promising and outperforms NQDBSCAN, ρ-approximate DBSCAN and AnyDBC.

查看译文

关键词

DBSCAN,ρ-approximate DBSCAN,BLOCK-DBSCAN,Core block

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要