Agent Attention: on the Integration of Softmax and Linear Attention

Dongchen Han,Tianzhu Ye,Yizeng Han,Xia Zhuofan,Siyuan Pan,Pengfei Wan,Shiji Song,Gao Huang

Computing Research Repository (CoRR)（2024）

Tsinghua University | DAMO Academy | Kuaishou Technology

Cited 111|Views150

Abstract

The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation power. Specifically, the Agent Attention, denoted as a quadruple (Q, A, K, V), introduces an additional set of agent tokens A into the conventional attention module. The agent tokens first act as the agent for the query tokens Q to aggregate information from K and V, and then broadcast the information back to Q. Given the number of agent tokens can be designed to be much smaller than the number of query tokens, the agent attention is significantly more efficient than the widely adopted Softmax attention, while preserving global context modelling capability. Interestingly, we show that the proposed agent attention is equivalent to a generalized form of linear attention. Therefore, agent attention seamlessly integrates the powerful Softmax attention and the highly efficient linear attention. Extensive experiments demonstrate the effectiveness of agent attention with various vision Transformers and across diverse vision tasks, including image classification, object detection, semantic segmentation and image generation. Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its linear attention nature. For instance, when applied to Stable Diffusion, our agent attention accelerates generation and substantially enhances image generation quality without any additional training. Code is available at https://github.com/LeapLabTHU/Agent-Attention.

Translated text

Key words

Attention mechanism,Agent attention,Vision Transformer

Bibtex

AI Read Science

Video&Figures

论文作者介绍

The authors include Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Siyuan Pan, Pengfei Wan, Shiji Song, and Gaogao Huang. They are affiliated with the Department of Automation at Tsinghua University, Megvii Technology, and Kuaishou Technology, among other institutions. Their research areas cover deep learning, computer vision, reinforcement learning, language models, representation learning, meta-learning, transfer learning, visual recognition, attention mechanisms, neural network architectures, object detection, image processing, 3D modeling, and more. The authors have achieved significant accomplishments in their respective fields, receiving numerous awards and honors.

文献大纲

Agent Attention: on the Integration of Softmax and Linear Attention

1. Introduction
- Transformer model applications in the field of computer vision
- Limitations of the Softmax attention mechanism: high computational complexity
- Limitations of existing methods: sacrificing global modeling capabilities
- Agent attention mechanism: balancing computational efficiency and representational power
2. Related Work
- Visual Transformer
- Linear attention mechanism
3. Preliminary Knowledge
- Self-attention mechanism
- Softmax attention mechanism
- Linear attention mechanism
4. Agent Transformer
- Agent attention mechanism: introducing proxy token A
- Agent attention module: containing proxy aggregation and proxy broadcasting
- Advantages of the Agent attention module: efficient computation, high expressiveness, large receptive field
5. Experiments
- Image classification
- Object detection
- Semantic segmentation
- Image generation
- High-resolution models
- Comparison with other linear attention mechanisms
- Ablation study
6. Conclusion
- Application of the Agent attention mechanism in visual Transformer
- Advantages of the Agent attention mechanism
- Future research directions of the Agent attention mechanism

关键问题

Q: What specific research methods were used in the paper?
- Agent Attention Model Design: The paper proposes a novel attention mechanism called Agent Attention, which introduces additional agent tokens to aggregate and broadcast information, achieving a combination of Softmax attention and linear attention.
- Experimental Validation: The paper conducts experiments on tasks such as ImageNet classification, COCO object detection, ADE20K semantic segmentation, and Stable Diffusion image generation to validate the effectiveness of Agent Attention.
- Ablation Study: The paper verifies the effectiveness of various components in the Agent Attention model through ablation studies, including the number of agent tokens, agent biases, and depth-wise separable convolutions.
Q: What are the main research findings and achievements?
- Effectiveness of Agent Attention Model: Experimental results show that the Agent Attention model achieves significant performance improvements on various visual tasks, especially in high-resolution scenarios.
- Balance of Efficiency and Expressiveness: The Agent Attention model significantly reduces computational complexity while maintaining high expressiveness, achieving a balance between efficiency and expressiveness.
- Generalizability: The Agent Attention model can be easily integrated into various visual Transformer models, such as DeiT, PVT, Swin, and CSwin.
Q: What are the current limitations of this research?
- Design of Agent Tokens: The paper uses a simple pooling method to generate agent tokens, which may result in information loss.
- Training Complexity: Although the computational complexity of the Agent Attention model is low, the training process still requires a large amount of computational resources.
- Model Interpretability: The internal mechanism of the Agent Attention model is complex and difficult to interpret.

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Related Papers

Reference papers

Cited Papers

Referring Flexible Image Restoration

Runwei Guan, Rongsheng Hu, Zhuhao Zhou, Tianlang Xue,Ka Lok Man,Jeremy Smith,Eng Gee Lim,Weiping Ding,Yutao Yue

Expert Syst Appl 2025

被引用0

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

【要点】：本文提出了一种新型注意力机制Agent Attention，通过引入一组小于查询向量的代理向量，平衡计算效率与表征能力，实现全局上下文建模，并证明其是线性注意力的泛化形式。

【方法】：Agent Attention通过在传统的(Q, K, V)注意力机制中引入代理向量A，先让代理向量代表查询向量Q收集K和V的信息，再将信息广播回Q，以此降低计算复杂度同时保持全局上下文建模的能力。

【实验】：本文在多个视觉任务上，包括图像分类、目标检测、语义分割和图像生成，使用不同的视觉Transformer模型进行了实验验证。特别地，在高分辨率场景下，Agent Attention由于具有线性注意力的特性，表现出了显著的性能提升。实验使用了多种数据集，具体数据集名称在论文中未明确列出。通过应用Agent Attention到Stable Diffusion模型上，加速了图像生成过程并提高了生成质量，且无需额外训练。代码已开源在https://github.com/LeapLabTHU/Agent-Attention。

去 AI 文献库对话

Agent Attention: on the Integration of Softmax and Linear Attention

Agent Attention: on the Integration of Softmax and Linear Attention

1. Introduction

2. Related Work

3. Preliminary Knowledge

4. Agent Transformer

5. Experiments

6. Conclusion

Q: What specific research methods were used in the paper?

Q: What are the main research findings and achievements?

Q: What are the current limitations of this research?