On the Out-Of-Distribution Generalization of Multimodal Large Language Models
CoRR(2024)
摘要
We investigate the generalization boundaries of current Multimodal Large
Language Models (MLLMs) via comprehensive evaluation under out-of-distribution
scenarios and domain-specific tasks. We evaluate their zero-shot generalization
across synthetic images, real-world distributional shifts, and specialized
datasets like medical and molecular imagery. Empirical results indicate that
MLLMs struggle with generalization beyond common training domains, limiting
their direct application without adaptation. To understand the cause of
unreliable performance, we analyze three hypotheses: semantic
misinterpretation, visual feature extraction insufficiency, and mapping
deficiency. Results identify mapping deficiency as the primary hurdle. To
address this problem, we show that in-context learning (ICL) can significantly
enhance MLLMs' generalization, opening new avenues for overcoming
generalization barriers. We further explore the robustness of ICL under
distribution shifts and show its vulnerability to domain shifts, label shifts,
and spurious correlation shifts between in-context examples and test data.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要