谷歌浏览器插件
订阅小程序
在清言上使用

Frame labeling and mapping for non-parallel voice conversion

2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP)(2017)

引用 0|浏览66
暂无评分
摘要
Voice conversion is to convert one person's voice into another person's voice. Depending on whether the contents of the speech data from both source and target speakers are the same, there are two types of conversion, namely, parallel or non-parallel voice conversions. For parallel voice conversion, since the contents of the speech data from the two speakers are the same, alignment methods can be easily used to establish the correspondence between the speech data of the two speakers. When applying the same methods from parallel voice conversion to non-parallel voice conversion, the mapping of corresponding signal segments is not straightforward. Recently, we proposed to use a DNN-HMM (Hybrid Deep Neural Network - Hidden Markov Model) recognizer to label each frame of the speech data from both source and target speakers, and establish mapping by clustering the vector of pseudo-likelihood of each frame. The experiments showed that the method generates results that are comparable to parallel voice conversion method. In this work, we further study how the method works for different settings in the frame mapping process. Using an exemplar-based parallel method conversion method for testing, we compare our method with the state-of-the-art method INCA (An Iterative combination of a Nearest Neighbor search step and a Conversion step Alignment method). The experiments show that the proposed method generates results similar to those generated by INCA-based voice conversion.
更多
查看译文
关键词
non-parallel voice conversion,frame mapping,dnn-hmm recognizer,clustering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要