Concept-based feature generation and selection for information retrieval

AAAI(2008)

引用 113|浏览40
暂无评分
摘要
Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based feature generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating features, that we have in supervised learning. We present a new feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated features are then evaluated and filtered on the basis of these sets. Experiments on TREC data confirm the superior performance of our method compared to the previous state of the art.
更多
查看译文
关键词
bag-of-words method,high quality feature selection,extensive world knowledge,retrieval performance,background knowledge,TREC data,Concept-based feature generation,Wikipedia-based feature generation,difficult retrieval task,new feature selection method,traditional information retrieval system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要