A Neural Corpus Indexer for Document Retrieval

Yujing Wang,Yingyan Hou,Haonan Wang,Ziming Miao,Shibin Wu,Hao Sun,Qi Chen,Yuqing Xia,Chengmin Chi,Guoshuai Zhao,Zheng Liu,Xing Xie,Hao Allen Sun,Weiwei Deng,Qi Zhang,Mao Yang

NeurIPS 2022（2022）

引用 38|浏览117

暂无评分

摘要

Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. In this paper, we aim to show that an end-to-end deep neural network unifying training and indexing stages can significantly improve the recall performance of traditional methods. To this end, we propose Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates relevant document identifiers directly for a designated query. To optimize the recall performance of NCI, we invent a prefix-aware weight-adaptive decoder architecture, and leverage tailored techniques including query generation, semantic document identifiers, and consistency-based regularization. Empirical studies demonstrated the superiority of NCI on two commonly used academic benchmarks, achieving +17.6% and +16.8% relative enhancement for Recall@1 on NQ320k dataset and R-Precision on TriviaQA dataset, respectively, compared to the best baseline method.

查看译文

关键词

document retrieval,sequence-to-sequence,model-based index

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要