Dual Operating Modes of In-Context Learning
CoRR(2024)
摘要
In-context learning (ICL) exhibits dual operating modes: task learning, i.e.,
acquiring a new skill from in-context samples, and task retrieval, i.e.,
locating and activating a relevant pretrained skill. Recent theoretical work
investigates various mathematical models to analyze ICL, but existing models
explain only one operating mode at a time. We introduce a probabilistic model,
with which one can explain the dual operating modes of ICL simultaneously.
Focusing on in-context learning of linear functions, we extend existing models
for pretraining data by introducing multiple task groups and task-dependent
input distributions. We then analyze the behavior of the optimally pretrained
model under the squared loss, i.e., the MMSE estimator of the label given
in-context examples. Regarding pretraining task distribution as prior and
in-context examples as the observation, we derive the closed-form expression of
the task posterior distribution. With the closed-form expression, we obtain a
quantitative understanding of the two operating modes of ICL. Furthermore, we
shed light on an unexplained phenomenon observed in practice: under certain
settings, the ICL risk initially increases and then decreases with more
in-context examples. Our model offers a plausible explanation for this "early
ascent" phenomenon: a limited number of in-context samples may lead to the
retrieval of an incorrect skill, thereby increasing the risk, which will
eventually diminish as task learning takes effect with more in-context samples.
We also theoretically analyze ICL with biased labels, e.g., zero-shot ICL,
where in-context examples are assigned random labels. Lastly, we validate our
findings and predictions via experiments involving Transformers and large
language models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要