UAVM: Towards Unifying Audio and Visual Models

Yuan Gong,Alexander H. Liu,Andrew Rouditchenko,James Glass

arxiv（2023）

引用 0|浏览55

暂无评分

摘要

Conventional audio-visual models have independent audio and video branches. In this work, we unify the audio and visual branches by designing a Unified Audio-Visual Model (UAVM). The UAVM achieves a new state-of-the-art audio-visual event classification accuracy of 65.8% on VGGSound. More interestingly, we also find a few intriguing properties of UAVM that the modality-independent counterparts do not have.

查看译文

关键词

unifying audio,visual models

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要