Divided Caption Model with Global Attention

2021 5TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE (ICIAI 2021)(2021)

引用 0|浏览7
暂无评分
摘要
Dense video captioning is a newly emerging task that aims at both locating and describing all events in a video. We identify and tackle two challenges on this task, namely, 1) the limitation of just attending local features; 2) the severely degraded description and increased training complexity caused by the redundant information. In this paper, we propose a new divided caption model, where two different attention mechanisms are introduced to rectify the captioning process in a unified framework. Firstly, we employ a global attention mechanism to encode video features in the proposal module, which can obtain a better temporal boundary. Second, we design bidirectional Long short-term memory (LSTM) with a common-attention mechanism to counterpoise 3d-convolutional neural network (c3d) features and global attention video content effectively in caption module to generate coherent natural language descriptions. Besides, we divide forward and backward video features in an event into segments to relieve the stress for degraded description and increased complexity. Extensive experiments demonstrate the competitive performance of the proposed Divided Caption Model with Global Attention (DCM-GA) over state-of-the-art methods on the ActivityNet Captions dataset.
更多
查看译文
关键词
Video Caption, Global Attention, Bidirectional LSTM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要