Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
CoRR(2024)
摘要
Existing parameter-efficient fine-tuning (PEFT) methods have achieved
significant success on vision transformers (ViTs) adaptation by improving
parameter efficiency. However, the exploration of enhancing inference
efficiency during adaptation remains underexplored. This limits the broader
application of pre-trained ViT models, especially when the model is
computationally extensive. In this paper, we propose Dynamic Tuning (DyT), a
novel approach to improve both parameter and inference efficiency for ViT
adaptation. Specifically, besides using the lightweight adapter modules, we
propose a token dispatcher to distinguish informative tokens from less
important ones, allowing the latter to dynamically skip the original block,
thereby reducing the redundant computation during inference. Additionally, we
explore multiple design variants to find the best practice of DyT. Finally,
inspired by the mixture-of-experts (MoE) mechanism, we introduce an enhanced
adapter to further boost the adaptation performance. We validate DyT across
various tasks, including image/video recognition and semantic segmentation. For
instance, DyT achieves comparable or even superior performance compared to
existing PEFT methods while evoking only 71
benchmark.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要