Using Mixture of Experts to accelerate dataset distillation

Zhi Xu,Zhenyong Fu

Journal of Visual Communication and Image Representation(2024)

引用 0|浏览1
暂无评分
摘要
Recently, large datasets have become increasingly necessary for most deep learning tasks, however, large datasets may bring some problems, such as disk storage and huge computational expense. Dataset distillation is an emerging field that aims to synthesize a small dataset from the original dataset, then a random model trained on the distillation dataset can achieve comparable performances to the same architecture model trained on the original dataset. Matching Training Trajectories (MTT) achieves a leading performance in this field, but it needs to pre-train 200 expert models before the formal distillation process, which is called buffer process. In this paper, we propose a new method to reduce the consumed time of buffer process. Concretely, we use Mixture of Experts (MoE) to train several expert models parallelly in buffer process. The experiments show our method can achieve a speedup of up to approximately 4∼8× in buffer process with getting comparable distillation performances.
更多
查看译文
关键词
Dataset distillation,Mixture of experts,Accelerate
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要