Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization
CoRR(2024)
摘要
Although Large Visual Language Models (LVLMs) have demonstrated exceptional
abilities in understanding multimodal data, they invariably suffer from
hallucinations, leading to a disconnect between the generated text and the
corresponding images. Almost all current visual contrastive decoding methods
attempt to mitigate these hallucinations by introducing visual uncertainty
information that appropriately widens the contrastive logits gap between
hallucinatory and targeted ones. However, due to uncontrollable nature of the
global visual uncertainty, they struggle to precisely induce the hallucinatory
tokens, which severely limits their effectiveness in mitigating hallucinations
and may even lead to the generation of undesired hallucinations. To tackle this
issue, we conducted the theoretical analysis to promote the effectiveness of
contrast decoding. Building on this insight, we introduce a novel optimization
strategy named Hallucination-Induced Optimization (HIO). This strategy seeks to
amplify the contrast between hallucinatory and targeted tokens relying on a
fine-tuned theoretical preference model (i.e., Contrary Bradley-Terry Model),
thereby facilitating efficient contrast decoding to alleviate hallucinations in
LVLMs. Extensive experimental research demonstrates that our HIO strategy can
effectively reduce hallucinations in LVLMs, outperforming state-of-the-art
methods across various benchmarks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要