Learning transformer-based attention region with multiple scales for occluded person re-identification.

Comput. Vis. Image Underst.(2023)

引用 5|浏览40
暂无评分
摘要
Occluded person re-identification(Re-ID), with the aim of matching occluded person pairs under cross-camera, remains challenging due to incomplete information and spatial misalignment. The state-of-the-art (SOTA) methods usually include a two-stage architecture based on the existing pose estimation models or the attention mechanism to generate human masks to extract features, which complicate the model and introduce additional biases. To address this issue, we propose a novel end-to-end transformer-based occluded person Re-ID model. Specifically, our model contains two crucial components: (1) the features of global and non-occluded person regions are extracted by two independent Transformer-based feature extraction networks respectively; (2) the distribution of common non-occluded human regions is learnt via a multiheaded self-attention mechanism, and then the Minimized Character-box Proposal (MCP) is utilized to generate accurate shared non-occluded crops. In our model, non-occluded human regions are not annotated and only weakly-supervision of ID labels with multiheaded self-attention are employed to jointly learn the distribution. Further, the human feature contains multi-scale information which is extracted from our dual-branch architecture. Extensive experiment results on four benchmarks of person Re-ID for two tasks (occluded, partial) demonstrate the effectiveness of our proposed framework which achieves the SOTA or the comparable performance on all benchmarks.
更多
查看译文
关键词
Person re-identification,Transformer,Deep learning,Multiple scales
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要