Heterogeneous Graph Network for Action Detection

IEEE Transactions on Circuits and Systems for Video Technology(2024)

引用 0|浏览18
暂无评分
摘要
Spatio-temporal action detection is a fundamental task that detects persons and recognizes their actions from videos. It requires reasoning about the spatial-temporal interactions between persons and their surroundings. Recently, more modalities have been found by researchers, which puts higher demands on the reasoning capability of the method, yet a method capable of holistic reasoning is still lacking. To this end, we propose a heterogeneous graph network, which aims to reason the spatial-temporal interactions among different types of nodes (video entities) and edges (inter-entity relations). Concretely, it includes spatial and temporal graphs, which are alternately updated. The spatial graph contains nodes of person appearance, person pose, object appearance, and hand interaction, and the temporal graph has person nodes at different moments. For information aggregation, we propose a person-centric heterogeneous graph reasoning algorithm, which introduces heterogeneity into the graphs through node-type-specific projections and modulated edge-type-specific representations. We find that the introduction of heterogeneity enriches the model’s ability to understand multi-modality, which facilitates better parsing of complex semantic relations in videos and potentially leads to further mining of spatial-temporal interactions between entities in the future. Experimental results on four public datasets demonstrate the superiority of our method. Code will be available after acceptance.
更多
查看译文
关键词
Action detection,spatio-temporal action detection,heterogeneous graph,relation reasoning,graph network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要