Selecting Informative Frames For Action Recognition With Partial Observations

2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)(2018)

引用 25|浏览28
暂无评分
摘要
Given a video clip that contains only one type of action (e.g., golfing), the goal of action recognition is to recognize this action category from a given set of action types. To deliver fast response for practical video applications, existing works have been endevouring on processing the leading frames of the input video. In our view, only the informative key frames extracted from this 'partial video' should be used for performing action recognition task. This will not only further speed up action recognition process due to less amount of data to be processed but also achieve higher recognition accuracy owing to more distinctive features presented to the learning network. For that, a novel a two-stage learning network architecture is proposed in this paper that consists of a selection network (S-net) and a recognition network (R-net). The S-net is a relatively-shallow network designed to efficiently identify informative key frames, while the R-net is a deep network to perform the final action recognition. In the S-net, a key frame selection criterion is further proposed for identifying informative key frames. Extensive experiments based on two benchmark datasets, UCF101 and HMDB51, have been conducted and clearly shown that our approach significantly outperforms existing state-of-the-art methods.
更多
查看译文
关键词
Action recognition, key frames, two-stream convolutional networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要