SYNTACC : Synthesizing Multi-Accent Speech By Weight Factorization

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览6
暂无评分
摘要
Conventional multi-speaker text-to-speech synthesis (TTS) is known to be capable of synthesizing speech for multiple voices, yet it cannot generate speech in different accents. This limitation has motivated us to develop SYNTACC (Synthesizing speech with accents) which adapts conventional multi-speaker TTS to produce multi-accent speech. Our method uses the YourTTS model and involves a novel multi-accent training mechanism. The method works by decomposing each weight matrix into a shared component and an accent-dependent component, with the former being initialized by the pretrained multi-speaker TTS model and the latter being factorized into vectors using rank-1 matrices to reduce the number of training parameters per accent. This weight factorization method proves to be effective in fine-tuning the SYNTACC on multi-accent data sets in a low-resource condition. Our SYNTACC model eventually allows speech synthesis in not only different voices but also in different accents.
更多
查看译文
关键词
Speech synthesis,accent adaptation,multi-speaker TTS,weight factorization,weight decomposition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要