Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling.

Wele Gedara Chaminda Bandara,Vishal M. Patel

2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)（2024）

引用 0|浏览18

暂无评分

摘要

In this paper, we introduce Attention Prompt Tuning (APT) - a computationallyefficient variant of prompt tuning for video-based applications such as actionrecognition. Prompt tuning approaches involve injecting a set of learnableprompts along with data tokens during fine-tuning while keeping the backbonefrozen. This approach greatly reduces the number of learnable parameterscompared to full tuning. For image-based downstream tasks, normally a couple oflearnable prompts achieve results close to those of full tuning. However,videos, which contain more complex spatiotemporal information, require hundredsof tunable prompts to achieve reasonably good results. This reduces theparameter efficiency observed in images and significantly increases latency andthe number of floating-point operations (FLOPs) during inference. To tacklethese issues, we directly inject the prompts into the keys and values of thenon-local attention mechanism within the transformer block. Additionally, weintroduce a novel prompt reparameterization technique to make APT more robustagainst hyperparameter selection. The proposed APT approach greatly reduces thenumber of FLOPs and latency while achieving a significant performance boostover the existing parameter-efficient tuning methods on UCF101, HMDB51, andSSv2 datasets for action recognition. The code and pre-trained models areavailable at https://github.com/wgcban/apt

查看译文

关键词

Action Recognition,Computational Efficiency,Attention Mechanism,Floating-point Operations,Tuning Method,Transformer Block,Action Recognition Datasets,Image Processing,Learning Rate,Convolutional Neural Network,Language Processing,Appended,Tuning Parameter,Improvement In Accuracy,Recurrent Neural Network,Data Augmentation,Weight Decay,Multilayer Perceptron,Downstream Applications,Linear Probe,Input Tokens,Top-1 Accuracy,Multilayer Perceptron Layer,Top-5 Accuracy,Vision Transformer,Pre-trained Weights,Transformer Layers,Embedding Dimension,Network-based Methods,Image Classification

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要