Parallel Structures in Pre-training Data Yield In-Context Learning
CoRR(2024)
摘要
Pre-trained language models (LMs) are capable of in-context learning (ICL):
they can adapt to a task with only a few examples given in the prompt without
any parameter update. However, it is unclear where this capability comes from
as there is a stark distribution shift between pre-training text and ICL
prompts. In this work, we study what patterns of the pre-training data
contribute to ICL. We find that LMs' ICL ability depends on parallel
structures in the pre-training data – pairs of phrases following similar
templates in the same context window. Specifically, we detect parallel
structures by checking whether training on one phrase improves prediction of
the other, and conduct ablation experiments to study their effect on ICL. We
show that removing parallel structures in the pre-training data reduces LMs'
ICL accuracy by 51
excluding common patterns such as n-gram repetitions and long-range dependency,
showing the diversity and generality of parallel structures. A closer look at
the detected parallel structures indicates that they cover diverse linguistic
tasks and span long distances in the data.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要