Debiased Visual Question Answering via the perspective of question types

Tianyu Huai,Shuwen Yang, Junhang Zhang, Jiabao Zhao,Liang He

PATTERN RECOGNITION LETTERS(2024)

引用 0|浏览2
暂无评分
摘要
Visual Question Answering (VQA) aims to answer questions according to the given image. However, current VQA models tend to rely solely on textual information from the questions and ignore the visual information in the images to get answers, which is caused by bias that is generated during the training phase. Previous studies have shown that bias in VQA is mainly caused by the text modality, and our analysis suggests that question type is a crucial factor in bias formation. To address this bias, we proposed a self -supervised method including the Against Biased Samples(ABS) module that performs targeted debiasing by selecting samples that are prone to bias, and the Shuffle Question types(SQT) module that constructs negative samples by randomly replacing the question types of the samples selected by the ABS, to interrupting the shortcuts from question type to answer. Our approach mitigates the question -to -answer bias without using external annotations, overcoming the prior language problem. Additionally, we designed a new objective function for negative samples. Experimental results indicate that our method outperforms both self -supervised -based and supervised -based state-of-the-art approaches, achieving 70.36% accuracy on the VQA-CP v2 dataset.
更多
查看译文
关键词
Visual Question Answering,De-biasing,Self-supervised
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要