Learning Query-aware Embedding Index for Improving E-commerce Dense Retrieval

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023(2023)

引用 0|浏览23
暂无评分
摘要
The embedding index has become an essential part of the dense retrieval (DR) system, which enables a fast search for billion of items in online E-commerce applications. To accelerate the retrieval process in industrial scenarios, most of the previous studies only utilize item embeddings. However, the product quantization process without query embeddings will lead to inconsistency between queries and items. A straightforward solution is to put query embedding into the product quantization process. But we found that the distance of the positive query and item embedding pairs is too large, which means the query and item embeddings learned by the two-tower are not fully aligned. This problem would lead to performance decay when directly putting query embeddings into the product quantization. In this paper, we propose a novel query-aware embedding Index framework, which aligns the query and item embedding space to reduce the distance between positive pairs, thereby mixing the query and item embeddings to learn better cluster centers for product quantization. Specifically, we first propose s symmetric loss to train a better two-tower to achieve space alignment. Subsequently, we propose a mixed quantization strategy to put the query embeddings into the product quantization process for bridging the gap between queries and compressed item embeddings. Extensive experiments show that our framework significantly outperforms previous models on a real-world dataset, which demonstrates the superiority and effectiveness of the framework.
更多
查看译文
关键词
Dense retrieval,product quantization,symmetric loss,mixed quantization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要