Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons
arxiv(2024)
摘要
In this paper, we make a contribution that can be understood from two
perspectives: from an NLP perspective, we introduce a small challenge dataset
for NLI with large lexical overlap, which minimises the possibility of models
discerning entailment solely based on token distinctions, and show that GPT-4
and Llama 2 fail it with strong bias. We then create further challenging
sub-tasks in an effort to explain this failure. From a Computational
Linguistics perspective, we identify a group of constructions with three
classes of adjectives which cannot be distinguished by surface features. This
enables us to probe for LLM's understanding of these constructions in various
ways, and we find that they fail in a variety of ways to distinguish between
them, suggesting that they don't adequately represent their meaning or capture
the lexical properties of phrasal heads.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要