Subtopic mining using simple patterns and hierarchical structure of subtopic candidates from web documents

Information Processing & Management(2015)

引用 21|浏览56
暂无评分
摘要
We use only web document collection instead of query logs and external resources.Our simple patterns are based on noun phrases and alternative partial-queries.We maintain a balance between popularity and diversity of subtopics.Our method covered various search intentions of a query by its few subtopics.Our results were steadily improved by extracting more relevant and various subtopics. The intention gap between users and queries results in ambiguous and broad queries. To solve these problems, subtopic mining has been studied, which returns a ranked list of possible subtopics according to their relevance, popularity, and diversity. This paper proposes a novel method to mine subtopics using simple patterns and a hierarchical structure of subtopic candidates. First, relevant and various phrases are extracted as subtopic candidates using simple patterns based on noun phrases and alternative partial-queries. Second, a hierarchical structure of the subtopic candidates is constructed using sets of relevant documents from a web document collection. Finally, the subtopic candidates are ranked considering a balance between popularity and diversity using this structure. In experiments, our proposed methods outperformed the baselines and even an external resource based method at high-ranked subtopics, which shows that our methods can be effective and useful in various search scenarios like result diversification.
更多
查看译文
关键词
Search intention,Subtopic mining,Hierarchical structure
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要