Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
CoRR(2024)
摘要
Despite showing increasingly human-like abilities, large language models
(LLMs) often struggle with factual inaccuracies, i.e. "hallucinations", even
when they hold relevant knowledge. To address these hallucinations, current
approaches typically necessitate high-quality human factuality annotations. In
this work, we explore Self-Alignment for Factuality, where we leverage the
self-evaluation capability of an LLM to provide training signals that steer the
model towards factuality. Specifically, we incorporate Self-Eval, a
self-evaluation component, to prompt an LLM to validate the factuality of its
own generated responses solely based on its internal knowledge. Additionally,
we design Self-Knowledge Tuning (SK-Tuning) to augment the LLM's
self-evaluation ability by improving the model's confidence estimation and
calibration. We then utilize these self-annotated responses to fine-tune the
model via Direct Preference Optimization algorithm. We show that the proposed
self-alignment approach substantially enhances factual accuracy over Llama
family models across three key knowledge-intensive tasks on TruthfulQA and
BioGEN.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要