Statistical Guarantees for Consensus Clustering

Zhixin Zhou, Gautam Dudeja, Arash A Amini

ICLR 2023(2023)

引用 0|浏览2
暂无评分
摘要
Consider the problem of clustering $n$ objects. One can apply multiple algorithms to produce $N$ potentially different clustersings of the same objects, that is, partitions of the $n$ objects into $K$ groups. Even a single randomized algorithm can output different clusterings. This often happens when one samples from the posterior of a Bayesian model, or runs multiple MCMC chains from random initializations. A natural task is then to form a consensus among these different clusterings. The challenge in an unsupervised setting is that the optimal matching between clusters of different inputs is unknown. We model this problem as finding a barycenter (also known as Fr\'{e}chet mean) relative to the misclassification rate. We show by lifting the problem to the space of association matrices, one can derive aggregation algorithms that circumvent the knowledge of the optimal matchings. We analyze the statistical performance of aggregation algorithms under a stochastic label perturbation model, and show that a $K$-means type algorithm followed by a local refinement step can achieve near optimal performance, with a rate that decays exponentially in $N$. Numerical experiments show the effectiveness of the proposed methods.
更多
查看译文
关键词
consensus clustering,unsupervised label aggregation,spectral clustering,barycenter problem,Frechet mean,semidefinite relaxation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要