How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding
arxiv(2024)
摘要
Weakly-supervised Phrase Grounding (WPG) is an emerging task of inferring the
fine-grained phrase-region matching, while merely leveraging the coarse-grained
sentence-image pairs for training. However, existing studies on WPG largely
ignore the implicit phrase-region matching relations, which are crucial for
evaluating the capability of models in understanding the deep multimodal
semantics. To this end, this paper proposes an Implicit-Enhanced Causal
Inference (IECI) approach to address the challenges of modeling the implicit
relations and highlighting them beyond the explicit. Specifically, this
approach leverages both the intervention and counterfactual techniques to
tackle the above two challenges respectively. Furthermore, a high-quality
implicit-enhanced dataset is annotated to evaluate IECI and detailed
evaluations show the great advantages of IECI over the state-of-the-art
baselines. Particularly, we observe an interesting finding that IECI
outperforms the advanced multimodal LLMs by a large margin on this
implicit-enhanced dataset, which may facilitate more research to evaluate the
multimodal LLMs in this direction.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要