Exploiting Image-Related Inductive Biases in Single-Branch Visual Tracking
CoRR(2023)
摘要
Despite achieving state-of-the-art performance in visual tracking, recent
single-branch trackers tend to overlook the weak prior assumptions associated
with the Vision Transformer (ViT) encoder and inference pipeline. Moreover, the
effectiveness of discriminative trackers remains constrained due to the
adoption of the dual-branch pipeline. To tackle the inferior effectiveness of
the vanilla ViT, we propose an Adaptive ViT Model Prediction tracker (AViTMP)
to bridge the gap between single-branch network and discriminative models.
Specifically, in the proposed encoder AViT-Enc, we introduce an adaptor module
and joint target state embedding to enrich the dense embedding paradigm based
on ViT. Then, we combine AViT-Enc with a dense-fusion decoder and a
discriminative target model to predict accurate location. Further, to mitigate
the limitations of conventional inference practice, we present a novel
inference pipeline called CycleTrack, which bolsters the tracking robustness in
the presence of distractors via bidirectional cycle tracking verification.
Lastly, we propose a dual-frame update inference strategy that adeptively
handles significant challenges in long-term scenarios. In the experiments, we
evaluate AViTMP on ten tracking benchmarks for a comprehensive assessment,
including LaSOT, LaSOTExtSub, AVisT, etc. The experimental results
unequivocally establish that AViTMP attains state-of-the-art performance,
especially on long-time tracking and robustness.
更多查看译文
关键词
tracking,biases,image-related,single-branch
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要