The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse
CoRR(2024)
摘要
Although model editing has shown promise in revising knowledge in Large
Language Models (LLMs), its impact on the inherent capabilities of LLMs is
often overlooked. In this work, we reveal a critical phenomenon: even a single
edit can trigger model collapse, manifesting as significant performance
degradation in various benchmark tasks. However, benchmarking LLMs after each
edit, while necessary to prevent such collapses, is impractically
time-consuming and resource-intensive. To mitigate this, we propose using
perplexity as a surrogate metric, validated by extensive experiments
demonstrating its strong correlation with downstream task performance. We
further conduct an in-depth study on sequential editing, a practical setting
for real-world scenarios, across various editing methods and LLMs, focusing on
hard cases from our previous single edit studies. The results indicate that
nearly all examined editing methods result in model collapse after only few
edits. To facilitate further research, we have utilized ChatGPT to develop a
new dataset, HardCF, based on those hard cases. This dataset aims to establish
the foundation for pioneering research in reliable model editing and the
mechanisms underlying editing-induced model collapse. We hope this work can
draw the community's attention to the potential risks inherent in model editing
practices.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要