A Survey on Visual Mamba
arxiv(2024)
摘要
State space models (SSMs) with selection mechanisms and hardware-aware
architectures, namely Mamba, have recently demonstrated significant promise in
long-sequence modeling. Since the self-attention mechanism in transformers has
quadratic complexity with image size and increasing computational demands, the
researchers are now exploring how to adapt Mamba for computer vision tasks.
This paper is the first comprehensive survey aiming to provide an in-depth
analysis of Mamba models in the field of computer vision. It begins by
exploring the foundational concepts contributing to Mamba's success, including
the state space model framework, selection mechanisms, and hardware-aware
design. Next, we review these vision mamba models by categorizing them into
foundational ones and enhancing them with techniques such as convolution,
recurrence, and attention to improve their sophistication. We further delve
into the widespread applications of Mamba in vision tasks, which include their
use as a backbone in various levels of vision processing. This encompasses
general visual tasks, Medical visual tasks (e.g., 2D / 3D segmentation,
classification, and image registration, etc.), and Remote Sensing visual tasks.
We specially introduce general visual tasks from two levels: High/Mid-level
vision (e.g., Object detection, Segmentation, Video classification, etc.) and
Low-level vision (e.g., Image super-resolution, Image restoration, Visual
generation, etc.). We hope this endeavor will spark additional interest within
the community to address current challenges and further apply Mamba models in
computer vision.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要