Chrome Extension
WeChat Mini Program
Use on ChatGLM

SMGKM: an Efficient Incremental Algorithm for Clustering Document Collections

Lecture notes in computer science(2023)

Cited 0|Views0
No score
Abstract
Given a large unlabeled document collection, the aim of this paper is to develop an accurate and efficient algorithm for solving the clustering problem over this collection. Document collections typically contain tens or hundreds of thousands of documents, with thousands or tens of thousands of features (i.e., distinct words). Most existing clustering algorithms struggle to find accurate solutions on such large data sets. The proposed algorithm overcomes this difficulty by an incremental approach, incrementing the number of clusters progressively from an initial value of one to a set value. At each iteration, the new candidate cluster is initialized using a partitioning approach which is guaranteed to minimize the objective function. Experiments have been carried out over six, diverse datasets and with different evaluation criteria, showing that the proposed algorithm has outperformed comparable state-of-the-art clustering algorithms in all cases.
More
Translated text
Key words
Document Clustering,Clustering Algorithms,Semi-supervised Clustering,Density-based Clustering,Document Categorization
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined