Chrome Extension

WeChat Mini Program

Use on ChatGLM

Log in

Academic Profile User Profile

My Following Paper Collections Browse History

Learning Video Moment Retrieval Without a Single Annotated Video

Junyu Gao,Changsheng Xu

IEEE Transactions on Circuits and Systems for Video Technology（2022）

Cited 51|Views50

Abstract

Video moment retrieval has progressed significantly over the past few years, aiming to search the moment that is most relevant to a given natural language query. Most existing methods are trained in a fully-supervised or a weakly-supervised manner, which requires a time-consuming and expensive manually labeling process. In this work, we propose an alternative approach to achieving video moment retrieval that requires no textual annotations of videos and instead leverages the existing visual concept detectors and a pre-trained image-sentence embedding space. Specifically, we design a video-conditioned sentence generator to produce a suitable sentence representation by utilizing the mined visual concepts in videos. We then design a GNN-based relation-aware moment localizer to reasonably select a portion of video clips under the guidance of the generated sentence. Finally, the pre-trained image-sentence embedding space is adopted to evaluate the matching scores between the generated sentence and moment representations with the knowledge transferred from the image domain. By maximizing these scores, the sentence generator and moment localizer can enhance and complement each other to achieve the moment retrieval task. Experimental results on the Charades-STA and ActivityNet Captions datasets demonstrate the effectiveness of our proposed method.

More

Translated text

Key words

Visualization,Task analysis,Generators,Training,Graph neural networks,Semantics,Detectors,Video moment retrieval,graph neural network,unpaired learning

求助PDF

上传PDF

Bibtex

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Related Papers

Reference papers

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollar,C. Lawrence Zitnick

2014

被引用57775 | 浏览

Deep Residual Learning for Image Recognition

Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun

2016

被引用263578 | 浏览

ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding.

Fabian Caba Heilbron,Victor Escorcia,Bernard Ghanem,Juan Carlos Niebles

2015

被引用3194 | 浏览

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding.

Gunnar A. Sigurdsson,Gul Varol,Xiaolong Wang,Ali Farhadi,Ivan Laptev,Abhinav Gupta

2016

被引用1541 | 浏览

The More You Know: Using Knowledge Graphs for Image Classification.

Kenneth Marino,Ruslan Salakhutdinov,Abhinav Gupta

2016

被引用448 | 浏览

Harnessing Object and Scene Semantics for Large-Scale Video Understanding

Zuxuan Wu,Yanwei Fu,Yu-Gang Jiang,Leonid Sigal

2016

被引用112 | 浏览

Places: A 10 Million Image Database for Scene Recognition.

Bolei Zhou,Agata Lapedriza,Aditya Khosla,Aude Oliva,Antonio Torralba

2017

被引用5175 | 浏览

Learning Robust Visual-Semantic Embeddings.

Yao-Hung Hubert Tsai,Liang-Kang Huang,Ruslan Salakhutdinov

2017

被引用207 | 浏览

TALL: Temporal Activity Localization Via Language Query.

Jiyang Gao,Chen Sun,Zhenheng Yang,Ram Nevatia

2017

被引用949 | 浏览

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset.

Joao Carreira,Andrew Zisserman

2017

被引用10751 | 浏览

Single Shot Temporal Action Detection

Tianwei Lin,Xu Zhao,Zheng Shou

2017

被引用524 | 浏览

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

Chung-Wei Lee, Wei Fang,Chih-Kuan Yeh,Yu-Chiang Frank Wang

2018

被引用367 | 浏览

Unsupervised Action Discovery and Localization in Videos

Khurram Soomro,Mubarak Shah

2017

被引用82 | 浏览

Multimodal Deep Embedding Via Hierarchical Grounded Compositional Semantics

Yueting Zhuang,Jun Song,Fei Wu,Xi Li,Zhongfei Zhang,Yong Rui

2016

被引用19 | 浏览

Zero-shot Recognition Via Semantic Embeddings and Knowledge Graphs

Xiaolong Wang,Yufei Ye,Abhinav Gupta

2018

被引用742 | 浏览

Joint Embeddings with Multimodal Cues for Video-Text Retrieval

Niluthpol C. Mithun,Juncheng Li,Florian Metze,Amit K. Roy-Chowdhury

2019

被引用218 | 浏览

Large-Scale Video Retrieval Using Image Queries

Andre Araujo,Bernd Girod

2017

被引用91 | 浏览

Attentive Moment Retrieval in Videos

Meng Liu,Xiang Wang,Liqiang Nie,Xiangnan He,Baoquan Chen,Tat-Seng Chua

2018

被引用323 | 浏览

Person Re-identification with Deep Similarity-Guided Graph Neural Network.

Yantao Shen,Hongsheng Li,Shuai Yi,Dapeng Chen,Xiaogang Wang

2018

被引用379 | 浏览

A Survey on Deep Transfer Learning.

Chuanqi Tan,Fuchun Sun,Tao Kong,Wenchang Zhang,Chao Yang,Chunfang Liu

2018

被引用4121 | 浏览

TAK1 Mediates Apoptosis Via P38 Involve in Ischemia-Induced Renal Fibrosis

Jun Zhou,Jiying Zhong,Zhenxing Huang,Meijuan Liao,Sen Lin, Jia Chen,Hongtao Chen

2018

被引用480 | 浏览

MAN: Moment Alignment Network for Natural Language Moment Retrieval Via Iterative Graph Adjustment

Da Zhang,Xiyang Dai,Xin Wang,Yuan-Fang Wang,Larry S. Davis

2018

被引用367 | 浏览

Learning Compositional Representations for Few-Shot Recognition

Pavel Tokmakov,Yu-Xiong Wang,Martial Hebert

2018

被引用148 | 浏览

Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal Attention.

Bin Jiang,Xin Huang,Chao Yang,Junsong Yuan

International Multimedia Conference 2019

被引用91 | 浏览

Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model

Weining Wang,Yan Huang,Liang Wang

2019

被引用210 | 浏览

Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval

Shupeng Su,Zhisheng Zhong,Chao Zhang

2019

被引用320 | 浏览

Unsupervised Cross-Media Retrieval Using Domain Adaptation with Scene Graph

Yuxin Peng,Jingze Chi

2019

被引用43 | 浏览

Richly Activated Graph Convolutional Network for Robust Skeleton-Based Action Recognition

Yi-Fan Song, Zhang,Caifeng Shan,Liang Wang

2020

被引用239 | 浏览

Language-guided Navigation Via Cross-Modal Grounding and Alternate Adversarial Learning

Weixia Zhang,Chao Ma,Qi Wu,Xiaokang Yang

2021

被引用29 | 浏览

Multimodal Local-Global Attention Network for Affective Video Content Analysis

Yangjun Ou,Zhenzhong Chen,Feng Wu

2020

被引用40 | 浏览

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

Summary is being generated by the instructions you defined