Knowledge Distillation of LLM for Automatic Scoring of Science Education Assessments
arxiv(2023)
摘要
This study proposes a method for knowledge distillation (KD) of fine-tuned
Large Language Models (LLMs) into smaller, more efficient, and accurate neural
networks. We specifically target the challenge of deploying these models on
resource-constrained devices. Our methodology involves training the smaller
student model (Neural Network) using the prediction probabilities (as soft
labels) of the LLM, which serves as a teacher model. This is achieved through a
specialized loss function tailored to learn from the LLM's output
probabilities, ensuring that the student model closely mimics the teacher's
performance. To validate the performance of the KD approach, we utilized a
large dataset, 7T, containing 6,684 student-written responses to science
questions and three mathematical reasoning datasets with student-written
responses graded by human experts. We compared accuracy with state-of-the-art
(SOTA) distilled models, TinyBERT, and artificial neural network (ANN) models.
Results have shown that the KD approach has 1
than ANN and TinyBERT and comparable accuracy to the teacher model.
Furthermore, the student model size is 0.02M, 10,000 times smaller in
parameters and x10 faster in inferencing than the teacher model and TinyBERT,
respectively. The significance of this research lies in its potential to make
advanced AI technologies accessible in typical educational settings,
particularly for automatic scoring.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要