Selective and Orthogonal Feature Activation for Pedestrian Attribute Recognition

Junyi Wu,Yan Huang, Min Gao,Yuzhen Niu, Mingjing Yang, Zhipeng Gao,Jianqiang Zhao

AAAI 2024(2024)

引用 0|浏览3
暂无评分
摘要
Pedestrian Attribute Recognition (PAR) involves identifying the attributes of individuals in person images. Existing PAR methods typically rely on CNNs as the backbone network to extract pedestrian features. However, CNNs process only one adjacent region at a time, leading to the loss of long-range inter-relations between different attribute-specific regions. To address this limitation, we leverage the Vision Transformer (ViT) instead of CNNs as the backbone for PAR, aiming to model long-range relations and extract more robust features. However, PAR suffers from an inherent attribute imbalance issue, causing ViT to naturally focus more on attributes that appear frequently in the training set and ignore some pedestrian attributes that appear less. The native features extracted by ViT are not able to tolerate the imbalance attribute distribution issue. To tackle this issue, we propose two novel components: the Selective Feature Activation Method (SFAM) and the Orthogonal Feature Activation Loss. SFAM smartly suppresses the more informative attribute-specific features, compelling the PAR model to capture discriminative features from regions that are easily overlooked. The proposed loss enforces an orthogonal constraint on the original feature extracted by ViT and the suppressed features from SFAM, promoting the complementarity of features in space. We conduct experiments on several benchmark PAR datasets, including PETA, PA100K, RAPv1, and RAPv2, demonstrating the effectiveness of our method. Specifically, our method outperforms existing state-of-the-art approaches by GRL, IAA-Caps, ALM, and SSC in terms of mA on the four datasets, respectively.
更多
查看译文
关键词
CV: Image and Video Retrieval,CV: Representation Learning for Vision
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要