Joint Event Detection and Description in Continuous Video Streams.

Huijuan Xu,Boyang Li,Vasili Ramanishka,Leonid Sigal,Kate Saenko

2019 IEEE Winter Applications of Computer Vision Workshops (WACVW)（2019）

引用 50|浏览125

暂无评分

摘要

Dense video captioning involves first localizing events in a video and then generating captions for the identified events. We present the Joint Event Detection and Description Network (JEDDi-Net) for solving this task in an end-to-end fashion, which encodes the input video stream with three-dimensional convolutional layers, proposes variable- length temporal events based on pooled features, and then uses a two-level hierarchical LSTM module with context modeling to transcribe the event proposals into captions. We show the effectiveness of our proposed JEDDi-Net on the large-scale ActivityNet Captions dataset.

查看译文

关键词

Proposals,Training,Three-dimensional displays,Visualization,Streaming media,Context modeling,Event detection

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要