Efficiently Distilling LLMs for Edge Applications
arxiv(2024)
摘要
Supernet training of LLMs is of great interest in industrial applications as
it confers the ability to produce a palette of smaller models at constant cost,
regardless of the number of models (of different size / latency) produced. We
propose a new method called Multistage Low-rank Fine-tuning of
Super-transformers (MLFS) for parameter-efficient supernet training. We show
that it is possible to obtain high-quality encoder models that are suitable for
commercial edge applications, and that while decoder-only models are resistant
to a comparable degree of compression, decoders can be effectively sliced for a
significant reduction in training time.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要