In-Context Learning State Vector with Inner and Momentum Optimization
arxiv(2024)
摘要
Large Language Models (LLMs) have exhibited an impressive ability to perform
In-Context Learning (ICL) from only a few examples. Recent works have indicated
that the functions learned by ICL can be represented through compressed vectors
derived from the transformer. However, the working mechanisms and optimization
of these vectors are yet to be thoroughly explored. In this paper, we address
this gap by presenting a comprehensive analysis of these compressed vectors,
drawing parallels to the parameters trained with gradient descent, and
introduce the concept of state vector. Inspired by the works on model soup and
momentum-based gradient descent, we propose inner and momentum optimization
methods that are applied to refine the state vector progressively as test-time
adaptation. Moreover, we simulate state vector aggregation in the multiple
example setting, where demonstrations comprising numerous examples are usually
too lengthy for regular ICL, and further propose a divide-and-conquer
aggregation method to address this challenge. We conduct extensive experiments
using Llama-2 and GPT-J in both zero-shot setting and few-shot setting. The
experimental results show that our optimization method effectively enhances the
state vector and achieves the state-of-the-art performance on diverse tasks.
Code is available at https://github.com/HITsz-TMG/ICL-State-Vector
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要