JDsearch: A Personalized Product Search Dataset with Real Queries and Full Interactions

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023(2023)

引用 4|浏览47
暂无评分
摘要
Recently, personalized product search attracts great attention and many models have been proposed. To evaluate the effectiveness of these models, previous studies mainly utilize the simulated Amazon recommendation dataset, which contains automatically generated queries and excludes cold users and tail products. We argue that evaluating with such a dataset may yield unreliable results and conclusions, and deviate from real user satisfaction. To overcome these problems, in this paper, we release a personalized product search dataset comprised of real user queries and diverse user-product interaction types (clicking, adding to cart, following, and purchasing) collected from JD.com, a popular Chinese online shopping platform. More specifically, we sample about 170,000 active users on a specific date, then record all their interacted products and issued queries in one year, without removing any tail users and products. This finally results in roughly 12,000,000 products, 9,400,000 real searches, and 26,000,000 user-product interactions. We study the characteristics of this dataset from various perspectives and evaluate representative personalization models to verify its feasibility. The dataset can be publicly accessed at Github: https://github.com/rucliujn/JDsearch.
更多
查看译文
关键词
Product Search,Personalized,Dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要