The Manga Whisperer: Automatically Generating Transcriptions for Comics
CoRR(2024)
摘要
In the past few decades, Japanese comics, commonly referred to as Manga, have
transcended both cultural and linguistic boundaries to become a true worldwide
sensation. Yet, the inherent reliance on visual cues and illustration within
manga renders it largely inaccessible to individuals with visual impairments.
In this work, we seek to address this substantial barrier, with the aim of
ensuring that manga can be appreciated and actively engaged by everyone.
Specifically, we tackle the problem of diarisation i.e. generating a
transcription of who said what and when, in a fully automatic way.
To this end, we make the following contributions: (1) we present a unified
model, Magi, that is able to (a) detect panels, text boxes and character boxes,
(b) cluster characters by identity (without knowing the number of clusters
apriori), and (c) associate dialogues to their speakers; (2) we propose a novel
approach that is able to sort the detected text boxes in their reading order
and generate a dialogue transcript; (3) we annotate an evaluation benchmark for
this task using publicly available [English] manga pages. The code, evaluation
datasets and the pre-trained model can be found at:
https://github.com/ragavsachdeva/magi.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要