The DeepMotion entry to the GENEA Challenge 2022

Multimodal Interfaces and Machine Learning for Multimodal Interaction（2022）

引用 2|浏览8

暂无评分

摘要

ABSTRACT This paper describes the method and evaluation results of our DeepMotion entry to the GENEA Challenge 2022. One difficulty in data-driven gesture synthesis is that there may be multiple viable gesture motions for the same speech utterance. Therefore the deterministic regression methods can not resolve the conflicting samples and may produce more damped motions. We proposed a two-stage model to address this uncertainty issue in gesture synthesis. Inspired by recent text-to-image synthesis methods, our gesture synthesis system utilizes a VQ-VAE model to first extract smaller gesture units as codebook vectors from training data. An autoregressive model based on GPT-2 transformer is then applied to model the probability distribution on the discrete latent space of VQ-VAE. The user evaluation results show the proposed method is able to produce gesture motions with reasonable human-likeness and gesture appropriateness.

查看译文

关键词

deepmotion entry,genea challenge

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要