Efficient Density-peaks Clustering Algorithms on Static and Dynamic Data in Euclidean Space

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA(2024)

引用 0|浏览18
暂无评分
摘要
Clustering multi-dimensional points is a fundamental task in many fields, and density-based clustering supports many applications because it can discover clusters of arbitrary shapes. This article addresses the problem of Density-Peaks Clustering (DPC) in Euclidean space. DPC already has many applications, but its straightforward implementation incurs O(n(2)) time, where n is the number of points, thereby does not scale to large datasets. To enable DPC on large datasets, we first propose empirically efficient exact DPC algorithm, Ex-DPC. Although this algorithm is much faster than the straightforward implementation, it still suffers from O(n(2)) time theoretically. We hence propose a new exact algorithm, Ex-DPC++, that runs in o(n(2)) time. We accelerate their efficiencies by leveraging multi-threading. Moreover, real-world datasets may have arbitrary updates (point insertions and deletions). It is hence important to support efficient cluster updates. To this end, we propose D-DPC for fully dynamic DPC. We conduct extensive experiments using real datasets, and our experimental results demonstrate that our algorithms are efficient and scalable.
更多
查看译文
关键词
Density-peaks clustering,parallel algorithms,multi-dimensional points
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要