BOK-VQA: Bilingual Outside Knowledge-Based Visual Question Answering Via Graph Representation Pretraining
Proceedings of the AAAI Conference on Artificial Intelligence(2024)
摘要
The current research direction in generative models, such as the recentlydeveloped GPT4, aims to find relevant knowledge information for multimodal andmultilingual inputs to provide answers. Under these research circumstances, thedemand for multilingual evaluation of visual question answering (VQA) tasks, arepresentative task of multimodal systems, has increased. Accordingly, wepropose a bilingual outside-knowledge VQA (BOK-VQA) dataset in this study thatcan be extended to multilingualism. The proposed data include 17K images, 17Kquestion-answer pairs for both Korean and English and 280K instances ofknowledge information related to question-answer content. We also present aframework that can effectively inject knowledge information into a VQA systemby pretraining the knowledge information of BOK-VQA data in the form of graphembeddings. Finally, through in-depth analysis, we demonstrated the actualeffect of the knowledge information contained in the constructed training dataon VQA.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要