A Simple D2-Sampling Based PTAS for k-Means and Other Clustering Problems
Algorithmica(2014)
摘要
Given a set of points $P \subset\mathbb{R}^{d}$ , the k -means clustering problem is to find a set of k centers $C = \{ c_{1},\ldots,c_{k}\}, c_{i} \in\mathbb{R}^{d}$ , such that the objective function x P e ( x , C )2, where e ( x , C ) denotes the Euclidean distance between x and the closest center in C , is minimized. This is one of the most prominent objective functions that has been studied with respect to clustering. D 2-sampling (Arthur and Vassilvitskii, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'07, pp. 1027---1035, SIAM, Philadelphia, 2007 ) is a simple non-uniform sampling technique for choosing points from a set of points. It works as follows: given a set of points $P \subset\mathbb{R}^{d}$ , the first point is chosen uniformly at random from P . Subsequently, a point from P is chosen as the next sample with probability proportional to the square of the distance of this point to the nearest previously sampled point. D 2-sampling has been shown to have nice properties with respect to the k -means clustering problem. Arthur and Vassilvitskii (Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'07, pp. 1027---1035, SIAM, Philadelphia, 2007 ) show that k points chosen as centers from P using D 2-sampling give an O (log k ) approximation in expectation. Ailon et al. (NIPS, pp. 10---18, 2009 ) and Aggarwal et al. (Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pp. 15---28, Springer, Berlin, 2009 ) extended results of Arthur and Vassilvitskii (Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'07, pp. 1027---1035, SIAM, Philadelphia, 2007 ) to show that O ( k ) points chosen as centers using D 2-sampling give an O (1) approximation to the k -means objective function with high probability. In this paper, we further demonstrate the power of D 2-sampling by giving a simple randomized (1+ ∈ )-approximation algorithm that uses the D 2-sampling in its core.
更多查看译文
关键词
k-means clustering,k-median,ptas,sampling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络