Improving the Accuracy of System Performance Estimation by Using Shards

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval(2019)

引用 19|浏览43
暂无评分
摘要
We improve the measurement accuracy of retrieval system performance by better modeling the noise present in test collection scores. Our technique draws its inspiration from two approaches: one, which exploits the variable measurement accuracy of topics; the other, which randomly splits document collections into shards. We describe and theoretically analyze an ANOVA model able to capture the effects of topics, systems, and document shards as well as their interactions. Using multiple TREC collections, we empirically confirm theoretical results in terms of improved estimation accuracy and robustness of found significant differences. The improvements compared to widely used test collection measurement techniques are substantial. We speculate that our technique works because we do not assume that the topics of a test collection measure performance equally.
更多
查看译文
关键词
anova, effectiveness model, multiple comparison
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要