Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning.

Sang-Ho Lee, Jae‐Young Chung, Yang Yu,Gun-Hee Kim,Thomas M. Breuel,Gal Chechik, Yang Song

arXiv (Cornell University)(2021)

引用 0|浏览0
暂无评分
摘要
Large-scale datasets are the cornerstone of representation learning. Existing self-supervised approaches extract learning signals by making certain assumptions about the data, e.g., spatio-temporal continuity and multimodal correspondence. However, finding large amounts of data that satisfy such assumptions is not straightforward, and this restricts the community to rely on datasets collected through laborious annotation and/or manual filtering processes. In this paper, we propose a subset optimization approach for automatic dataset curation. Focusing on audio-visual representation learning, we find a subset that provides the maximum mutual information between audio and visual channels in videos. We show that self-supervised models trained on our data, despite being automatically constructed, achieve competitive downstream performances compared to existing datasets that require annotation and/or manual filtering. The most significant benefit of our approach is scalability. We release a dataset of 100M videos with high audio-visual correspondence.
更多
查看译文
关键词
datasets,automatic curation,large-scale,audio-visual
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要