FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering
arxiv(2024)
摘要
Table Question Answering (TQA) aims at composing an answer to a question
based on tabular data. While prior research has shown that TQA models lack
robustness, understanding the underlying cause and nature of this issue remains
predominantly unclear, posing a significant obstacle to the development of
robust TQA systems. In this paper, we formalize three major desiderata for a
fine-grained evaluation of robustness of TQA systems. They should (i) answer
questions regardless of alterations in table structure, (ii) base their
responses on the content of relevant cells rather than on biases, and (iii)
demonstrate robust numerical reasoning capabilities. To investigate these
aspects, we create and publish a novel TQA evaluation benchmark in English. Our
extensive experimental analysis reveals that none of the examined
state-of-the-art TQA systems consistently excels in these three aspects. Our
benchmark is a crucial instrument for monitoring the behavior of TQA systems
and paves the way for the development of robust TQA systems. We release our
benchmark publicly.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要