VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness
arxiv(2024)
摘要
Finetuning a pretrained vision model (PVM) is a common technique for learning
downstream vision tasks. However, the conventional finetuning process with
randomly sampled data points results in diminished training efficiency. To
address this drawback, we propose a novel approach, Vision-language
Collaborative Active Finetuning (VeCAF). With the emerging availability of
labels and natural language annotations of images through web-scale crawling or
controlled generation, VeCAF makes use of these information to perform
parametric data selection for PVM finetuning. VeCAF incorporates the finetuning
objective to select significant data points that effectively guide the PVM
towards faster convergence to meet the performance goal. This process is
assisted by the inherent semantic richness of the text embedding space which we
use to augment image features. Furthermore, the flexibility of text-domain
augmentation allows VeCAF to handle out-of-distribution scenarios without
external data. Extensive experiments show the leading performance and high
computational efficiency of VeCAF that is superior to baselines in both
in-distribution and out-of-distribution image classification tasks. On
ImageNet, VeCAF uses up to 3.3x less training batches to reach the target
performance compared to full finetuning, and achieves an accuracy improvement
of 2.7
of batches.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要