Agent Attention: on the Integration of Softmax and Linear Attention
Computing Research Repository (CoRR)(2024)
Tsinghua University | DAMO Academy | Kuaishou Technology
The authors include Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Siyuan Pan, Pengfei Wan, Shiji Song, and Gaogao Huang. They are affiliated with the Department of Automation at Tsinghua University, Megvii Technology, and Kuaishou Technology, among other institutions. Their research areas cover deep learning, computer vision, reinforcement learning, language models, representation learning, meta-learning, transfer learning, visual recognition, attention mechanisms, neural network architectures, object detection, image processing, 3D modeling, and more. The authors have achieved significant accomplishments in their respective fields, receiving numerous awards and honors.
Agent Attention: on the Integration of Softmax and Linear Attention
1. Introduction
- Transformer model applications in the field of computer vision
- Limitations of the Softmax attention mechanism: high computational complexity
- Limitations of existing methods: sacrificing global modeling capabilities
- Agent attention mechanism: balancing computational efficiency and representational power
2. Related Work
- Visual Transformer
- Linear attention mechanism
3. Preliminary Knowledge
- Self-attention mechanism
- Softmax attention mechanism
- Linear attention mechanism
4. Agent Transformer
- Agent attention mechanism: introducing proxy token A
- Agent attention module: containing proxy aggregation and proxy broadcasting
- Advantages of the Agent attention module: efficient computation, high expressiveness, large receptive field
5. Experiments
- Image classification
- Object detection
- Semantic segmentation
- Image generation
- High-resolution models
- Comparison with other linear attention mechanisms
- Ablation study
6. Conclusion
- Application of the Agent attention mechanism in visual Transformer
- Advantages of the Agent attention mechanism
- Future research directions of the Agent attention mechanism
Q: What specific research methods were used in the paper?
- Agent Attention Model Design: The paper proposes a novel attention mechanism called Agent Attention, which introduces additional agent tokens to aggregate and broadcast information, achieving a combination of Softmax attention and linear attention.
- Experimental Validation: The paper conducts experiments on tasks such as ImageNet classification, COCO object detection, ADE20K semantic segmentation, and Stable Diffusion image generation to validate the effectiveness of Agent Attention.
- Ablation Study: The paper verifies the effectiveness of various components in the Agent Attention model through ablation studies, including the number of agent tokens, agent biases, and depth-wise separable convolutions.
Q: What are the main research findings and achievements?
- Effectiveness of Agent Attention Model: Experimental results show that the Agent Attention model achieves significant performance improvements on various visual tasks, especially in high-resolution scenarios.
- Balance of Efficiency and Expressiveness: The Agent Attention model significantly reduces computational complexity while maintaining high expressiveness, achieving a balance between efficiency and expressiveness.
- Generalizability: The Agent Attention model can be easily integrated into various visual Transformer models, such as DeiT, PVT, Swin, and CSwin.
Q: What are the current limitations of this research?
- Design of Agent Tokens: The paper uses a simple pooling method to generate agent tokens, which may result in information loss.
- Training Complexity: Although the computational complexity of the Agent Attention model is low, the training process still requires a large amount of computational resources.
- Model Interpretability: The internal mechanism of the Agent Attention model is complex and difficult to interpret.
