Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling
CoRR(2024)
摘要
The goal of this paper is automatic character-aware subtitle generation.
Given a video and a minimal amount of metadata, we propose an audio-visual
method that generates a full transcript of the dialogue, with precise speech
timestamps, and the character speaking identified. The key idea is to first use
audio-visual cues to select a set of high-precision audio exemplars for each
character, and then use these exemplars to classify all speech segments by
speaker identity. Notably, the method does not require face detection or
tracking. We evaluate the method over a variety of TV sitcoms, including
Seinfeld, Fraiser and Scrubs. We envision this system being useful for the
automatic generation of subtitles to improve the accessibility of the vast
amount of videos available on modern streaming services. Project page :
更多查看译文
关键词
character-aware subtitling,audio-visual speaker diarisation,speech recognition,video understanding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要