Sign Language Production using Neural Machine Translation and Generative Adversarial Networks.

BMVC(2018)

引用 68|浏览57
暂无评分
摘要
We present a novel approach to automatic Sign Language Production using stateof-the-art Neural Machine Translation (NMT) and Image Generation techniques. Oursystem is capable of producing sign videos from spoken language sentences. Contrary tocurrent approaches that are dependent on heavily annotated data, our approach requiresminimal gloss and skeletal level annotations for training. We achieve this by breakingdown the task into dedicated sub-processes. We first translate spoken language sentencesinto sign gloss sequences using an encoder-decoder network. We then find a data drivenmapping between glosses and skeletal sequences. We use the resulting pose informationto condition a generative model that produces sign language video sequences. Weevaluate our approach on the recently released PHOENIX14T Sign Language Translationdataset. We set a baseline for text-to-gloss translation, reporting a BLEU-4 score of16.34/15.26 on dev/test sets. We further demonstrate the video generation capabilitiesof our approach by sharing qualitative results of generated sign sequences given theirskeletal correspondence.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要