Mamba-FETrack: Frame-Event Tracking via State Space Model
arxiv(2024)
摘要
RGB-Event based tracking is an emerging research topic, focusing on how to
effectively integrate heterogeneous multi-modal data (synchronized exposure
video frames and asynchronous pulse Event stream). Existing works typically
employ Transformer based networks to handle these modalities and achieve decent
accuracy through input-level or feature-level fusion on multiple datasets.
However, these trackers require significant memory consumption and
computational complexity due to the use of self-attention mechanism. This paper
proposes a novel RGB-Event tracking framework, Mamba-FETrack, based on the
State Space Model (SSM) to achieve high-performance tracking while effectively
reducing computational costs and realizing more efficient tracking.
Specifically, we adopt two modality-specific Mamba backbone networks to extract
the features of RGB frames and Event streams. Then, we also propose to boost
the interactive learning between the RGB and Event features using the Mamba
network. The fused features will be fed into the tracking head for target
object localization. Extensive experiments on FELT and FE108 datasets fully
validated the efficiency and effectiveness of our proposed tracker.
Specifically, our Mamba-based tracker achieves 43.5/55.6 on the SR/PR metric,
while the ViT-S based tracker (OSTrack) obtains 40.0/50.9. The GPU memory cost
of ours and ViT-S based tracker is 13.98GB and 15.44GB, which decreased about
9.5%. The FLOPs and parameters of ours/ViT-S based OSTrack are 59GB/1076GB
and 7MB/60MB, which decreased about 94.5% and 88.3%, respectively. We
hope this work can bring some new insights to the tracking field and greatly
promote the application of the Mamba architecture in tracking. The source code
of this work will be released on
.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要