Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning
CoRR(2024)
摘要
Large language models (LLMs) have been shown to perform better when asked to
reason step-by-step before answering a question. However, it is unclear to what
degree the model's final answer is faithful to the stated reasoning steps. In
this paper, we perform a causal mediation analysis on twelve LLMs to examine
how intermediate reasoning steps generated by the LLM influence the final
outcome and find that LLMs do not reliably use their intermediate reasoning
steps when generating an answer. To address this issue, we introduce FRODO, a
framework to tailor small-sized LMs to generate correct reasoning steps and
robustly reason over these steps. FRODO consists of an inference module that
learns to generate correct reasoning steps using an implicit causal reward
function and a reasoning module that learns to faithfully reason over these
intermediate inferences using a counterfactual and causal preference objective.
Our experiments show that FRODO significantly outperforms four competitive
baselines. Furthermore, FRODO improves the robustness and generalization
ability of the reasoning LM, yielding higher performance on out-of-distribution
test sets. Finally, we find that FRODO's rationales are more faithful to its
final answer predictions than standard supervised fine-tuning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要