LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation
CoRR(2024)
摘要
The rise of multimodal misinformation on social platforms poses significant
challenges for individuals and societies. Its increased credibility and broader
impact compared to textual misinformation make detection complex, requiring
robust reasoning across diverse media types and profound knowledge for accurate
verification. The emergence of Large Vision Language Model (LVLM) offers a
potential solution to this problem. Leveraging their proficiency in processing
visual and textual information, LVLM demonstrates promising capabilities in
recognizing complex information and exhibiting strong reasoning skills. In this
paper, we first investigate the potential of LVLM on multimodal misinformation
detection. We find that even though LVLM has a superior performance compared to
LLMs, its profound reasoning may present limited power with a lack of evidence.
Based on these observations, we propose LEMMA: LVLM-Enhanced Multimodal
Misinformation Detection with External Knowledge Augmentation. LEMMA leverages
LVLM intuition and reasoning capabilities while augmenting them with external
knowledge to enhance the accuracy of misinformation detection. Our method
improves the accuracy over the top baseline LVLM by 7
Fakeddit datasets respectively.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要