Convergence of the Algorithm of Additive Regularization of Topic Models

TRUDY INSTITUTA MATEMATIKI I MEKHANIKI URO RAN(2022)

引用 0|浏览0
暂无评分
摘要
The problem of probabilistic topic modeling is as follows. Given a collection of text documents, find the conditional distribution over topics for each document and the conditional distribution over words (or terms) for each topic. Log-likelihood maximization is used to solve this problem. The problem generally has an infinite set of solutions and is ill-posed according to Hadamard. In the framework of Additive Regularization of Topic Models (ARTM), a weighted sum of regularization criteria is added to the main log-likelihood criterion. The numerical method for solving this optimization problem is a kind of an iterative EM-algorithm written in a general form for an arbitrary smooth regularizer as well as for a linear combination of smooth regularizers. This paper studies the problem of convergence of the EM iterative process. Sufficient conditions are obtained for the convergence to a stationary point of the regularized log-likelihood. The constraints imposed on the regularizer are not too restrictive. We give their interpretations from the point of view of the practical implementation of the algorithm. A modification of the algorithm is proposed that improves the convergence without additional time and memory costs. Experiments on a news text collection have shown that our modification both accelerates the convergence and improves the value of the criterion to be optimized.
更多
查看译文
关键词
topic models,additive regularization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要