WWW3E8: 259,000 Relevance Labels for Studying the Effect of Document Presentation Order for Relevance Assessors

Research and Development in Information Retrieval（2021）

引用 3|浏览19

暂无评分

摘要

ABSTRACTIn IR evaluation based on depth-k pooling, there are several strategies to order the pooled documents for relevance assessors. Among them, the simplest approach is to completely randomise the order "so assessors cannot tell if a document was highly ranked by some system or how many systems (or which systems) retrieved the document." An approach that is in sharp contrast to the above is the prioritisation approach taken by NTCIRPOOL, a tool widely used at NTCIR. NTCIRPOOL sorts the pooled documents by "pseudorelevance," a statistic that reflects the popularity of each document within the depth-k pools. Although these two strategies have coexisted for over two decades, the IR research community has yet to reach a consensus as to what advantages each of these two strategies actually offer. To help researchers directly address this question using their favourite methods of analysis, we have released a large-scale data set called WWW3E8. It comprises eight independent sets of qrels for the 160 English topics of the NTCIR-15 WWW-3 task: four qrels files constructed using the randomisation approach, and another four constructed using the prioritisation approach of NTCIRPOOL. Each qrels file covers 32,375 topic-document pairs; hence, WWW3E8 contains a total of 259,000 relevance labels. Moreover, the data set contains the raw English subtask run files from the WWW-3 task, the randomised and prioritised pool files, and topic-by-run score matrices of the official measures used in the task. Hence, researchers interested in the above research question regarding document ordering can utilise WWW3E8 as a common ground to directly compare the two strategies.

查看译文

关键词

evaluation, pooling, relevance assessments, NTCIR, test collections, web search

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要