Boosting Text Clustering using Topic Selection

international conference on pattern recognition(2018)

引用 4|浏览148
暂无评分
摘要
Latent Dirichlet Allocation (LDA) is a key topic modeling algorithm in the text mining field. Despite the great success of LDA, the state of the art reports that LDA is sensitive to the choice of hyper parameters and accordingly, the quality of the topics found depends on tuning. Instead of looking for the optimal hyper parameters of LDA for a given corpus, we propose a strategy for topic selection and aggregation that exploits hyper parameter variability, as the number of topics inferred, to boost the quality of the topics found. We show that our approach is simple and very effective to boost topic models. Experimental results show that our proposal improves the quality of the topics found, favoring document and term clustering tasks.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要