Efficient Stitchable Task Adaptation
CVPR 2024(2023)
摘要
The paradigm of pre-training and fine-tuning has laid the foundation for
deploying deep learning models. However, most fine-tuning methods are designed
to meet a specific resource budget. Recently, considering diverse deployment
scenarios with various resource budgets, stitchable neural network (SN-Net) is
introduced to quickly obtain numerous new networks (stitches) from the
pre-trained models (anchors) in a model family via model stitching. Although
promising, SN-Net confronts new challenges when adapting it to new target
domains, including huge memory and storage requirements and a long and
sub-optimal multistage adaptation process. In this work, we present a novel
framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce
a palette of fine-tuned models that adhere to diverse resource constraints.
Specifically, we first tailor parameter-efficient fine-tuning to share low-rank
updates among the stitches while maintaining independent bias terms. In this
way, we largely reduce fine-tuning memory burdens and mitigate the interference
among stitches that arises in task adaptation. Furthermore, we streamline a
simple yet effective one-stage deployment pipeline, which estimates the
important stitches to deploy with training-time gradient statistics. By
assigning higher sampling probabilities to important stitches, we also get a
boosted Pareto frontier. Extensive experiments on 25 downstream visual
recognition tasks demonstrate that our ESTA is capable of generating stitches
with smooth accuracy-efficiency trade-offs and surpasses the direct SN-Net
adaptation by remarkable margins with significantly lower training time and
fewer trainable parameters. Furthermore, we demonstrate the flexibility and
scalability of our ESTA framework by stitching LLMs from LLaMA family,
obtaining chatbot stitches of assorted sizes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要