In-Context Learning with Long-Context Models: An In-Depth Exploration
arxiv(2024)
摘要
As model context lengths continue to increase, the number of demonstrations
that can be provided in-context approaches the size of entire training
datasets. We study the behavior of in-context learning (ICL) at this extreme
scale on multiple datasets and models. We show that, for many datasets with
large label spaces, performance continues to increase with hundreds or
thousands of demonstrations. We contrast this with example retrieval and
finetuning: example retrieval shows excellent performance at low context
lengths but has diminished gains with more demonstrations; finetuning is more
data hungry than ICL but can sometimes exceed long-context ICL performance with
additional data. We use this ICL setting as a testbed to study several
properties of both in-context learning and long-context models. We show that
long-context ICL is less sensitive to random input shuffling than short-context
ICL, that grouping of same-label examples can negatively impact performance,
and that the performance boosts we see do not arise from cumulative gain from
encoding many examples together. We conclude that although long-context ICL can
be surprisingly effective, most of this gain comes from attending back to
similar examples rather than task learning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要