STREET: A MULTI-TASK STRUCTURED REASONING AND EXPLANATION BENCHMARK

Danilo Neves Ribeiro,Shen Wang,Xiaofei Ma,Henghui Zhu,Rui Dong,Deguang Kong,Juliette Burger, Anjelica Ramos,Zhiheng Huang, William Yang Wang,George Karypis,Bing Xiang,Dan Roth

ICLR 2023（2023）

引用 8|浏览61

暂无评分

摘要

We introduce STREET, a unified multi-task and multi-domain natural language reasoning and explanation benchmark. Unlike most existing question-answering (QA) datasets, we expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used to produce intermediate conclusions that can prove the correctness of a certain answer. We perform extensive evaluation with popular language models such as few-shot prompting GPT-3 and fine-tuned T5. We find that these models still lag behind human performance when producing such structured reasoning steps. We believe this work will provide a way for the community to better train and test systems on multi-step reasoning and explanations in natural language.

查看译文

关键词

natural language understanding,question answering,structured explanations,soft reasoning,dataset

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要