Video Person Re-identification Based on Transformer-CNN Model

2022 4th International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM)（2022）

Cited 0|Views6

No score

Abstract

To overcome the problems of pose variation, complex background and more occlusion in video person re-identification, a network model ResTNet based on convolutional neural network and Transformer was proposed. ResNet50 network was used to obtain local features and the output of its middle layer was input to Transformer as prior knowledge in ResTNet. In the Transformer branch, the size of the feature map was continuously reduced. The field of perception was expanded to fully explore the relationships among local features, and generated global features of pedestrians. The model computation was also decreased with the shift window method. Cross-entropy loss and triplet loss were used to optimize the model for the two branches during training, respectively. The Rank-1 and mAP on the large-scale MARS dataset reached 86.8% and 80.3%, respectively, which were 3.8% and 3.3% higher than the benchmark. The Transformer model was not only successfully applied to the field of video person re-identification, but also extensive experiments on several large datasets showed that the proposed ResTNet network can enhance the robustness of the recognition and improve the accuracy of person re-identification effectively.

Translated text

Key words

video-based person re-identification,local feature,convolutional neural network,Transformer,global feature

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined