Decoupled spatio-temporal grouping transformer for skeleton-based action recognition

Shengkun Sun, Zihao Jia,Yisheng Zhu,Guangcan Liu,Zhengtao Yu

VISUAL COMPUTER(2023)

引用 0|浏览4
暂无评分
摘要
Capturing correlations between joints is crucial in skeleton-based action recognition tasks. Transformer has demonstrated its capability in capturing such correlations. However, conventional Transformer-based approaches model the relationships between joints in a unified spatio-temporal dimension, disregarding the unique semantic information that exists in both the spatial and temporal dimensions of skeleton sequences. To address this issue, we present a novel decoupled spatio-temporal grouping Transformer (DSTGFormer) model. The skeleton sequence is split into multiple spatio-temporal groups, each containing a set of consecutive frames. The spatio-temporal positional encoding (STPE) module assigns identity information to each element in the sequence. The spatio-temporal grouping self-attention (STGA) module captures the spatial and temporal relationships between different joints within a spatio-temporal group. This decoupling of the spatial and temporal dimensions enables the extraction of semantic information with different meanings in each dimension. Additionally, we propose a within-group spatial global regularization mechanism to learn more general spatial attention maps, and an inter-group feature aggregation (IGFA) module to enhance the differentiation between similar actions. Our proposed method outperforms the state-of-the-art methods on two large-scale datasets in terms of both recognition accuracy and computational efficiency.
更多
查看译文
关键词
Skeleton-based action recognition,Transformer,Decoupled
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要