Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

James Chua, Edward Rees, Hunar Batra,Samuel R. Bowman,Julian Michael,Ethan Perez,Miles Turpin

arxiv(2024)

引用 0|浏览6
暂无评分
摘要
While chain-of-thought prompting (CoT) has the potential to improve the explainability of language model reasoning, it can systematically misrepresent the factors influencing models' behavior–for example, rationalizing answers in line with a user's opinion without mentioning this bias. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT), an unsupervised fine-tuning scheme that trains models to give consistent reasoning across prompts with and without biasing features. We construct a suite testing nine forms of biased reasoning on seven question-answering tasks, and find that applying BCT to GPT-3.5-Turbo with one bias reduces the rate of biased reasoning by 86 other forms of bias, reducing biased reasoning on held-out biases by an average of 37 this method may hold promise for reducing biased reasoning from as-of-yet unknown biases and on tasks where supervision for ground truth reasoning is unavailable.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要