Is-ClusterMPP: clustering algorithm through point processes and influence space towards high-dimensional data

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION(2019)

引用 4|浏览9
暂无评分
摘要
Clustering via marked point processes and influence space, Is-ClusterMPP, is a new unsupervised clustering algorithm through adaptive MCMC sampling of a marked point processes of interacting balls. The designed Gibbs energy cost function makes use of k-influence space information. It detects clusters of different shapes, sizes and unbalanced local densities. It aims at dealing also with high-dimensional datasets. By using the k-influence space, Is-ClusterMPP solves the problem of local heterogeneity in densities and prevents the impact of the global density in the detection of unbalanced classes. This concept reduces also the input values amount. The curse of dimensionality is handled by using a local subspace clustering principal embedded in a weighted similarity metric. Balls covering data points are constituting a configuration sampled from a marked point process (MPP). Due to the choice of the energy function, they tends to cover neighboring data, which share the same cluster. The statistical model of random balls is sampled through a Monte Carlo Markovian dynamical approach. The energy is balancing different goals. (1) The data driven objective function is provided according to k-influence space. Data in a high-dense region are favored to be covered by a ball. (2) An interaction part in the energy prevents the balls full overlap phenomenon and favors connected groups of balls. The algorithm through Markov dynamics, does converge towards configurations sampled from the MPP model. This algorithm has been applied in real benchmarks through gene expression data set of various sizes. Different experiments have been done to compare Is-ClusterMPP against the most well-known clustering algorithms and its efficiency is claimed.
更多
查看译文
关键词
Density-based clustering, Influence space, Marked point processes, Spatial data analysis, Gibbs cost, objective function, MCMC, Monte Carlo technique, High dimensional real data sets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要