Assessing the effect of inconsistent assessors on summarization evaluation

ACL(2012)

引用 24|浏览41
暂无评分
摘要
We investigate the consistency of human assessors involved in summarization evaluation to understand its effect on system ranking and automatic evaluation techniques. Using Text Analysis Conference data, we measure annotator consistency based on human scoring of summaries for Responsiveness, Readability, and Pyramid scoring. We identify inconsistencies in the data and measure to what extent these inconsistencies affect the ranking of automatic summarization systems. Finally, we examine the stability of automatic metrics (ROUGE and CLASSY) with respect to the inconsistent assessments.
更多
查看译文
关键词
summarization evaluation,human scoring,human assessor,automatic metrics,text analysis conference data,automatic summarization system,inconsistent assessor,system ranking,pyramid scoring,annotator consistency,automatic evaluation technique
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要