Large-Scale Audio Event Discovery In One Million Youtube Videos

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2017)

引用 46|浏览138
暂无评分
摘要
Internet videos provide a virtually boundless source of audio with a conspicuous lack of localized annotations, presenting an ideal setting for unsupervised methods. With this motivation, we perform an unprecedented exploration into the large-scale discovery of recurring audio events in a diverse corpus of one million YouTube videos (45K hours of audio). Our approach is to apply a streaming, non-parametric clustering algorithm to both spectral features and out-of-domain neural audio embeddings. We use a small portion of manually annotated audio events to quantitatively estimate the intrinsic clustering performance. In addition to providing a useful mechanism for unsupervised active learning, we demonstrate the effectiveness of the discovered audio event clusters in two downstream applications. The first is weakly-supervised learning, where we exploit the association of video-level metadata and cluster occurrences to temporally localize audio events. The second is informative activity detection, an unsupervised method for semantic saliency based on the corpus statistics of the discovered event clusters.
更多
查看译文
关键词
Audio event discovery, unsupervised learning, weakly-supervised learning, streaming clustering algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要