Measuring the Robustness of NLP Models to Domain Shifts

Nitay Calderon, Naveh Porat,Eyal Ben-David, Alexander Chapanin,Zorik Gekhman,Nadav Oved, Vitaly Shalumov,Roi Reichart

CoRR(2023)

引用 0|浏览28
暂无评分
摘要
Existing research on Domain Robustness (DR) suffers from disparate setups, lack of task variety, and scarce research on recent models and capabilities such as few-shot learning. Furthermore, we claim that the common practice of measuring DR might further obscure the picture. Current research focuses on challenge sets and relies solely on the Source Drop (SD): Using the source in-domain performance as a reference point for degradation. However, the Target Drop (TD) should be used as a complementary point of view. To understand the DR challenge in modern NLP models, we developed a benchmark comprised of seven NLP tasks, including classification, QA, and generation. Our benchmark focuses on natural topical domain shifts and enables measuring both the SD and the TD. Our comprehensive study, involving over 14,000 domain shifts across 18 fine-tuned and few-shot models, shows that both models suffer from drops upon domain shifts. While fine-tuned models excel in-domain, few-shot LLMs often surpass them cross-domain, showing better robustness. In addition, we found that a large SD can be explained by shifting to a harder domain rather than a genuine DR challenge. Thus, the TD is a more reliable metric.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要