Mitigate Position Bias in Large Language Models via Scaling a Single Dimension
CoRR(2024)
摘要
Large Language Models (LLMs) are increasingly applied in various real-world
scenarios due to their excellent generalization capabilities and robust
generative abilities. However, they exhibit position bias, also known as "lost
in the middle", a phenomenon that is especially pronounced in long-context
scenarios, which indicates the placement of the key information in different
positions of a prompt can significantly affect accuracy. This paper first
explores the micro-level manifestations of position bias, concluding that
attention weights are a micro-level expression of position bias. It further
identifies that, in addition to position embeddings, causal attention mask also
contributes to position bias by creating position-specific hidden states. Based
on these insights, we propose a method to mitigate position bias by scaling
this positional hidden states. Experiments on the NaturalQuestions
Multi-document QA, KV retrieval, LongBench and timeline reorder tasks, using
various models including RoPE models, context windowextended models, and Alibi
models, demonstrate the effectiveness and generalizability of our approach. Our
method can improve performance by up to 15.2
of hidden states. Our code is available at https://aka.ms/PositionalHidden.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要