Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling

Bruno Korbar,Jaesung Huh,Andrew Zisserman

CoRR（2024）

引用 0|浏览4

暂无评分

摘要

The goal of this paper is automatic character-aware subtitle generation. Given a video and a minimal amount of metadata, we propose an audio-visual method that generates a full transcript of the dialogue, with precise speech timestamps, and the character speaking identified. The key idea is to first use audio-visual cues to select a set of high-precision audio exemplars for each character, and then use these exemplars to classify all speech segments by speaker identity. Notably, the method does not require face detection or tracking. We evaluate the method over a variety of TV sitcoms, including Seinfeld, Fraiser and Scrubs. We envision this system being useful for the automatic generation of subtitles to improve the accessibility of the vast amount of videos available on modern streaming services. Project page :

查看译文

关键词

character-aware subtitling,audio-visual speaker diarisation,speech recognition,video understanding

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要