Towards Micro-video Understanding by Joint Sequential-Sparse Modeling.

Meng Liu,Liqiang Nie,Meng Wang,Baoquan Chen

MM '17: ACM Multimedia Conference Mountain View California USA October, 2017（2017）

引用 66|浏览97

暂无评分

摘要

Like the traditional long videos, micro-videos are the unity of textual, acoustic, and visual modalities. These modalities sequentially tell a real-life event from distinct angles. Yet, unlike the traditional long videos with rich content, micro-videos are very short, lasting for 6-15 seconds, and they hence usually convey one or a few high-level concepts. In the light of this, we have to characterize and jointly model the sparseness and multiple sequential structures for better micro-video understanding. To accomplish this, in this paper, we present an end-to-end deep learning model, which packs three parallel LSTMs to capture the sequential structures and a convolutional neural network to learn the sparse concept-level representations of micro-videos. We applied our model to the application of micro-video categorization. Besides, we constructed a real-world dataset for sequence modeling and released it to facilitate other researchers. Experimental results demonstrate that our model yields better performance than several state-of-the-art baselines.

查看译文

关键词

Micro-Video Understanding, Parallel LSTMs, Dictionary Learning, Convolutional Neural Network

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要