Memory Augmented Deep Recurrent Neural Network for Video Question Answering

Chengxiang Yin,Jian Tang,Zhiyuan Xu,Yanzhi Wang

IEEE Transactions on Neural Networks and Learning Systems（2020）

引用 21|浏览68

暂无评分

摘要

Video question answering (VideoQA) is a very important but challenging multimedia task, which automatically analyzes questions and videos and generates accurate answers. However, research on VideoQA is still in its infancy. In this article, we propose a novel memory augmented deep recurrent neural network (MA-DRNN) model for VideoQA, which features a new method for encoding videos and questions, and memory augmentation using the emerging differentiable neural computer (DNC). Specifically, we encode textual (questions) information before visual (videos) information, which leads to better visual-textual representations. Moreover, we leverage DNC (with an external memory) for storing and retrieving useful information in questions and videos, and modeling the long-term visual-textual dependence. To evaluate the proposed model, we conducted extensive experiments using the VTW data set and MSVD-QA data set, which are both Widely used large-scale video data sets for language-level understanding. The experimental results have well validated the proposed model and showed that it outperforms the state-of-the-art in terms of various accuracy-related metrics.

查看译文

关键词

Deep learning,differentiable neural computer (DNC),memory augmented neural network,recurrent neural network (RNN),video question answering (VideoQA)

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要