Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models
CoRR(2024)
摘要
Object hallucination has been an Achilles' heel which hinders the broader
applications of large vision-language models (LVLMs). Object hallucination
refers to the phenomenon that the LVLMs claim non-existent objects in the
image. To mitigate the object hallucinations, instruction tuning and external
model-based detection methods have been proposed, which either require
large-scare computational resources or depend on the detection result of
external models. However, there remains an under-explored field to utilize the
LVLM itself to alleviate object hallucinations. In this work, we adopt the
intuition that the LVLM tends to respond logically consistently for existent
objects but inconsistently for hallucinated objects. Therefore, we propose a
Logical Closed Loop-based framework for Object Hallucination Detection and
Mitigation, namely LogicCheckGPT. In specific, we devise logical consistency
probing to raise questions with logical correlations, inquiring about
attributes from objects and vice versa. Whether their responses can form a
logical closed loop serves as an indicator of object hallucination. As a
plug-and-play method, it can be seamlessly applied to all existing LVLMs.
Comprehensive experiments conducted on three benchmarks across four LVLMs have
demonstrated significant improvements brought by our method, indicating its
effectiveness and generality.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要