TopicGPT: A Prompt-based Topic Modeling Framework

NAACL-HLT(2024)

引用 0|浏览49
暂无评分
摘要
Topic modeling is a well-established technique for exploring text corpora.Conventional topic models (e.g., LDA) represent topics as bags of words thatoften require "reading the tea leaves" to interpret; additionally, they offerusers minimal control over the formatting and specificity of resulting topics.To tackle these issues, we introduce TopicGPT, a prompt-based framework thatuses large language models (LLMs) to uncover latent topics in a textcollection. TopicGPT produces topics that align better with humancategorizations compared to competing methods: it achieves a harmonic meanpurity of 0.74 against human-annotated Wikipedia topics compared to 0.64 forthe strongest baseline. Its topics are also interpretable, dispensing withambiguous bags of words in favor of topics with natural language labels andassociated free-form descriptions. Moreover, the framework is highly adaptable,allowing users to specify constraints and modify topics without the need formodel retraining. By streamlining access to high-quality and interpretabletopics, TopicGPT represents a compelling, human-centered approach to topicmodeling.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要