Structured Packing in LLM Training Improves Long Context Utilization
CoRR(2023)
摘要
Recent developments in long-context large language models have attracted
considerable attention. Yet, their real-world applications are often hindered
by ineffective context information use. This work shows that structuring
training data to increase semantic interdependence is an effective strategy for
optimizing context utilization. To this end, we introduce Structured Packing
for Long Context (SPLiCe), a method for creating training examples by using
information retrieval methods to collate mutually relevant documents into a
single training context. We empirically validate SPLiCe on large 3B and 7B
models, showing perplexity improvements and better long-context utilization on
downstream tasks. Remarkably, already relatively short fine-tuning with SPLiCe
is enough to attain these benefits. Additionally, the comprehensive study of
SPLiCe reveals intriguing transfer effects such as training on code data
leading to perplexity improvements on text data.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要