A*: Atrous Spatial Temporal Action Recognition for Real Time Applications.

Myeongjun Kim,Federica Spinola,Philipp Benz,Tae-Hoon Kim

2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)（2024）

引用 0|浏览1

暂无评分

摘要

Deep learning has become a popular tool across various fields and is increasingly being integrated into real-world applications such as autonomous driving cars and surveillance cameras. One area of active research is recognizing human actions, including identifying unsafe or abnormal behaviors. Temporal information is crucial for action recognition tasks. Global context, as well as the target person, are also important for judging human behaviors. However, larger networks that can capture all of these features face difficulties operating in real-time. To address these issues, we propose A*: Atrous Spatial Temporal Action Recognition for Real Time Applications. A* includes four modules aimed at improving action detection networks. First, we introduce a Low-Level Feature Aggregation module. Second, we propose the Atrous Spatio-Temporal Pyramid Pooling module. Third, we suggest to fuse all extracted image and video features in an Image-Video Feature Fusion module. Finally, we integrate the Proxy Anchor Loss for action features into the loss function. We evaluate A* on three common action detection benchmarks, and achieve state-of-the-art performance on JHMDB and UCF101-24, while staying competitive on AVA. Furthermore, we demonstrate that A* can achieve real-time inference speeds of 33 FPS, making it suitable for real-world applications.

查看译文

关键词

Algorithms,Video recognition and understanding

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要