Multistage Collaborative Knowledge Distillation from a Large Language Model for Semi-Supervised Sequence Generation

Jiachen Zhao,Wenlong Zhao,Andrew Drozdov, Benjamin Rozonoyer,Md Arafat Sultan,Jay-Yoon Lee,Mohit Iyyer,Andrew McCallum

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1 Long Papers)（2024）

Cited 0|Views54

Abstract

We study semi-supervised sequence generation tasks, where the few labeledexamples are too scarce to finetune a model, and meanwhile, few-shot promptedlarge language models (LLMs) exhibit room for improvement. In this paper, wepresent the discovery that a student model distilled from a few-shot promptedLLM can commonly generalize better than its teacher to unseen examples on suchtasks. We find that the student is able to learn a general pattern from thehigh-quality pseudolabels produced by the teacher during knowledge distillation(KD), and favorably not a general pattern from the low-quality pseudolables.Leveraging this discovery, we propose a new method, Multistage CollaborativeKnowledge Distillation from an LLM (MCKD), for these tasks. MCKD first few-shotprompts an LLM to produce pseudolabels for unlabeled data. Then at each stageof an iterative KD process, a new pair of students is trained on disjointpartitions of the pseudolabeled data, and produces new and improvedpseudolabels for their unseen partitions. We conduct extensive experiments onfour syntactic and semantic parsing datasets and show the effectiveness of MCKDfor low-resource semi-supervised sequence generation. On CRAFT biomedicalparsing, for example, 3-stage MCKD with 50 labeled examples outperforms an LLMteacher and vanilla KD by 7.5the performance of supervised finetuning with 500 labeled examples.

Translated text

Bibtex

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

要点：该论文提出了一种多阶段协作知识蒸馏的方法，利用大型语言模型生成的学生模型在少样本情况下能够在任务上表现更好。

方法：采用多阶段协作知识蒸馏方法，首先利用少量带上下文学习的方式从大型语言模型中获取伪标签，然后在每个阶段，使用不重叠的伪标签数据对一对学生模型进行训练，每个学生模型进一步为未见过的部分生成新的改进伪标签，以监督下一轮学生模型的训练。

实验：在两个组分解析任务上展示了多阶段跨分区标记的优势。在CRAFT生物医学解析任务上，使用50个有标签样本的3阶段MCKD方法与使用500个有标签样本的有监督微调方法的性能相匹配，并且比使用大型语言模型和普通知识蒸馏方法的解析F1分别提高了7.5%和3.7%。

去 AI 文献库对话