An improvement to the K-means algorithm oriented to big data

Joaquin Perez Ortega, Ramos Pazos,Miguel Hidalgo,Nelva Almanza,Ocotlan Diazparra,Rene Santaolaya,Vitervo Caballero

AIP Conference Proceedings（2015）

引用 2|浏览10

暂无评分

摘要

The K-means clustering algorithm is widely used in several domains, because of its simplicity of implementation and interpretation. However, one of its limitations is its high computational complexity. In this work the problem of reducing the complexity of the K means algorithm is approached, in order to make possible the solution of large scale data sets like those from Big Data, without significantly degrading solution quality. To this end, a new metaheuristics is proposed, which by an early assignment of objects to clusters, significantly reduces the number of calculations of distances from objects to centroids. The approach was experimentally evaluated by solving real and synthetic datasets yielding encouraging results. Time reductions of up to 91% were obtained with respect to the standard K-means, at the expense of reducing quality by 3.2%.

查看译文

关键词

K-means,Big Data,Complexity reduction,Metaheuristics

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要