HODN: Disentangling Human-Object Feature for HOI Detection

Shuman Fang,Zhiwen Lin,Ke Yan,Jie Li,Xianming Lin,Rongrong Ji

IEEE TRANSACTIONS ON MULTIMEDIA（2024）

引用 0|浏览1

暂无评分

摘要

The task of Human-Object Interaction (HOI) detection is to detect humans and their interactions with surrounding objects, where transformer-based methods show dominant advances currently. However, these methods ignore the relationship among humans, objects, and interactions: 1) human features are more contributive than object ones to interaction prediction; 2) interactive information disturbs the detection of objects but helps human detection. In this article, we propose a Human and Object Disentangling Network (HODN) to model the HOI relationships explicitly, where humans and objects are first detected by two disentangling decoders independently and then processed by an interaction decoder. Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions with human features as the positional embeddings. To handle the opposite influences of interactions on humans and objects, we propose a Stop-Gradient Mechanism to stop interaction gradients from optimizing the object detection but to allow them to optimize the human detection. Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det datasets. It can be combined with existing methods easily for state-of-the-art results.

查看译文

关键词

Decoding,Feature extraction,Transformers,Task analysis,Visualization,Object detection,Detectors,Disentangling features,Human-Object Interaction detection,transformer,visual attention

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要