Streamable Speech Representation Disentanglement and Multi-Level Prosody Modeling for Live One-Shot Voice Conversion.

Haoquan Yang,Liqun Deng,Yu Ting Yeung,Nianzu Zheng, Yong Xu

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

引用 0|浏览8
暂无评分
摘要
This paper takes efforts to tackle the challenge of "live" oneshot voice conversion (VC), which performs conversion across arbitrary speakers in a streaming way while retaining high intelligibility and naturalness. We propose a hybrid unsupervised and supervised learning based VC model with a two-stage model training strategy. Specially, we first employ an unsupervised disentanglement framework to separate speech representations of different granularities Experimental results demonstrate that our proposed method achieves comparable performance on speech naturalness, intelligibility and speaker similarity with offline VC solutions, with sufficient efficiency for practical real-time applications. Audio samples are available online for demonstration(1).
更多
查看译文
关键词
voice,speech,conversion,multi-level,one-shot
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要