Probabilistic Topic Modeling for Comparative Analysis of Document Collections.
ACM Transactions on Knowledge Discovery from Data(2020)
摘要
Probabilistic topic models, which can discover hidden patterns in documents, have been extensively studied. However, rather than learning from a single document collection, numerous real-world applications demand a comprehensive understanding of the relationships among various document sets. To address such needs, this article proposes a new model that can identify the common and discriminative aspects of multiple datasets. Specifically, our proposed method is a Bayesian approach that represents each document as a combination of common topics (shared across all document sets) and distinctive topics (distributions over words that are exclusive to a particular dataset). Through extensive experiments, we demonstrate the effectiveness of our method compared with state-of-the-art models. The proposed model can be useful for “comparative thinking” analysis in real-world document collections.
更多查看译文
关键词
Probabilistic topic modeling,text mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络