Query Exposure Prediction for Groups of Documents in Rankings
CoRR(2024)
摘要
The main objective of an Information Retrieval system is to provide a user
with the most relevant documents to the user's query. To do this, modern IR
systems typically deploy a re-ranking pipeline in which a set of documents is
retrieved by a lightweight first-stage retrieval process and then re-ranked by
a more effective but expensive model. However, the success of a re-ranking
pipeline is heavily dependent on the performance of the first stage retrieval,
since new documents are not usually identified during the re-ranking stage.
Moreover, this can impact the amount of exposure that a particular group of
documents, such as documents from a particular demographic group, can receive
in the final ranking. For example, the fair allocation of exposure becomes more
challenging or impossible if the first stage retrieval returns too few
documents from certain groups, since the number of group documents in the
ranking affects the exposure more than the documents' positions. With this in
mind, it is beneficial to predict the amount of exposure that a group of
documents is likely to receive in the results of the first stage retrieval
process, in order to ensure that there are a sufficient number of documents
included from each of the groups. In this paper, we introduce the novel task of
query exposure prediction (QEP). Specifically, we propose the first approach
for predicting the distribution of exposure that groups of documents will
receive for a given query. Our new approach, called GEP, uses lexical
information from individual groups of documents to estimate the exposure the
groups will receive in a ranking. Our experiments on the TREC 2021 and 2022
Fair Ranking Track test collections show that our proposed GEP approach results
in exposure predictions that are up to 40
of adapted existing query performance prediction and resource allocation
approaches.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要