Pdan: Pyramid Dilated Attention Network For Action Detection

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021(2021)

引用 29|浏览9
暂无评分
摘要
Handling long and complex temporal information is an important challenge for action detection tasks. This challenge is further aggravated by densely distributed actions in untrimmed videos. Previous action detection methods fail in selecting the key temporal information in long videos. To this end, we introduce the Dilated Attention Layer (DAL). Compared to previous temporal convolution layer, DAL allocates attentional weights to local frames in the kernel, which enables it to learn better local representation across time. Furthermore, we introduce Pyramid Dilated Attention Network (PDAN) which is built upon DAL. With the help of multiple DALs with different dilation rates, PDAN can model short-term and long-term temporal relations simultaneously by focusing on local segments at the level of low and high temporal receptive fields. This property enables PDAN to handle complex temporal relations between different action instances in long untrimmed videos. To corroborate the effectiveness and robustness of our method, we evaluate it on three densely annotated, multi-label datasets: MultiTHUMOS, Charades and Toyota Smarthome Untrimmed (TSU) dataset. PDAN is able to outperform previous state-of-the-art methods on all these datasets.
更多
查看译文
关键词
handling long information,complex temporal information,action detection tasks,densely distributed actions,previous action detection methods,key temporal information,Dilated Attention Layer,DAL,previous temporal convolution layer,attentional weights,local frames,local representation,Pyramid Dilated Attention Network,PDAN,multiple DALs,different dilation rates,long-term temporal relations,local segments,low fields,high temporal receptive fields,complex temporal relations,different action instances,long untrimmed videos
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要