谷歌浏览器插件
订阅小程序
在清言上使用

Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling.

2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)(2024)

引用 0|浏览18
暂无评分
摘要
In this paper, we introduce Attention Prompt Tuning (APT) - a computationallyefficient variant of prompt tuning for video-based applications such as actionrecognition. Prompt tuning approaches involve injecting a set of learnableprompts along with data tokens during fine-tuning while keeping the backbonefrozen. This approach greatly reduces the number of learnable parameterscompared to full tuning. For image-based downstream tasks, normally a couple oflearnable prompts achieve results close to those of full tuning. However,videos, which contain more complex spatiotemporal information, require hundredsof tunable prompts to achieve reasonably good results. This reduces theparameter efficiency observed in images and significantly increases latency andthe number of floating-point operations (FLOPs) during inference. To tacklethese issues, we directly inject the prompts into the keys and values of thenon-local attention mechanism within the transformer block. Additionally, weintroduce a novel prompt reparameterization technique to make APT more robustagainst hyperparameter selection. The proposed APT approach greatly reduces thenumber of FLOPs and latency while achieving a significant performance boostover the existing parameter-efficient tuning methods on UCF101, HMDB51, andSSv2 datasets for action recognition. The code and pre-trained models areavailable at https://github.com/wgcban/apt
更多
查看译文
关键词
Action Recognition,Computational Efficiency,Attention Mechanism,Floating-point Operations,Tuning Method,Transformer Block,Action Recognition Datasets,Image Processing,Learning Rate,Convolutional Neural Network,Language Processing,Appended,Tuning Parameter,Improvement In Accuracy,Recurrent Neural Network,Data Augmentation,Weight Decay,Multilayer Perceptron,Downstream Applications,Linear Probe,Input Tokens,Top-1 Accuracy,Multilayer Perceptron Layer,Top-5 Accuracy,Vision Transformer,Pre-trained Weights,Transformer Layers,Embedding Dimension,Network-based Methods,Image Classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要