谷歌浏览器插件
订阅小程序
在清言上使用

Enhancing visual tracking with a unified temporal Transformer framework

IEEE Transactions on Intelligent Vehicles(2024)

引用 0|浏览9
暂无评分
摘要
Visual object tracking is an essential research topic in computer vision with numerous practical applications including visual surveillance systems, autonomous vehicles and intelligent transportation systems. It involves tackling various challenges such as motion blur, occlusion and distractors, which require trackers to leverage temporal information, including temporal appearance information, temporal trajectory information and temporal context information. However, existing trackers usually focus on employing one certain temporal information while neglecting the complementarity of different types of temporal information. Additionally, cross-frame correlations that enable the transfer of diverse temporal information during tracking are under-explored. In this work, we propose a Unified Temporal Transformer Framework (UTTF) for robust visual tracking. Our framework effectively establishes multi-scale cross-frame relationships within historical frameworks and exploits the complementary information among three typical temporal information sources. Specifically, a Pyramid Spatial-Temporal Transformer Encoder (PSTTE) is designed to mutually reinforce historical features by establishing sound multi-scale associations (i.e., token-level, semantic-level and frame-level). Furthermore, an Adaptive Fusion Transformer Decoder (AFTD) is proposed to adaptively aggregate informative temporal cues from historical frames to enhance features of the current frame. Moreover, the proposed UTTF network can be easily extended to various tracking frameworks. Our experiments on seven prevalent visual object tracking benchmarks demonstrate that our proposed trackers outperform existing ones, establishing new state-of-the-art results.
更多
查看译文
关键词
Object tracking,temporal information,Transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要