Multi-receptive field spatiotemporal network for action recognition

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS(2023)

引用 0|浏览23
暂无评分
摘要
Despite the great progress in action recognition made by deep neural networks, visual tempo may be overlooked in the feature learning process of existing methods. The visual tempo is the dynamic and temporal scale variation of actions. Existing models usually understand spatiotemporal scenes using temporal and spatial convolutions, which are limited in both temporal and spatial dimensions, and they cannot cope with differences in visual tempo changes. To address these issues, we propose a multi-receptive field spatiotemporal (MRF-ST) network to effectively model the spatial and temporal information of different receptive fields. In the proposed network, dilated convolution is utilized to obtain different receptive fields. Meanwhile, dynamic weighting for different dilation rates is designed based on the attention mechanism. Thus, the proposed MRF-ST network can directly caption various tempos in the same network layer without any additional cost. Moreover, the network can improve the accuracy of action recognition by learning more visual tempos of different actions. Extensive evaluations show that MRF-ST reaches the state-of-the-art on three popular benchmarks for action recognition: UCF-101, HMDB-51, and Diving-48. Further analysis also indicates that MRF-ST can significantly improve the performance at the scenes with large variances in visual tempo.
更多
查看译文
关键词
Action recognition,Spatiotemporal,Multi-receptive field,Visual tempo
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要