Logovit: Local-Global Vision Transformer for Object Re-Identification

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览1
暂无评分
摘要
Object re-identification (ReID) is prone to errors under variations in scale, illumination, complex background, and object occlusion scenarios. To overcome these challenges, attention mechanisms are employed to focus on the object's characteristics, thereby extracting better discriminative features. This paper introduces a local-global vision transformer (LoGoViT) for object re-identification by learning a hierarchical-level representation from fine-grained (local) to general (global) context features. It comprises two components: (i) shift and shuffle operations to generate robust local features and (ii) local-global module to aggregate the multi-level hierarchy features of an object. Extensive experiments show that our method achieves state-of-the-art on the ReID benchmarks. We further investigate effective augmentation operations and discuss how the patch modifications improve the proposed model's generalization under occlusion scenarios. The source code is available at https://github.com/nguyenphan99/LoGoViT.
更多
查看译文
关键词
Object re-id,public security,vision transformer,multi-scale,patch modification augmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要