Frame labeling and mapping for non-parallel voice conversion

Minghui Dong,Chenyu Yang,Jochen Walter Ehnes,Yanfeng Lu,Huaiping Ming,Dong-Yan Huang

2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP)（2017）

引用 0|浏览66

暂无评分

摘要

Voice conversion is to convert one person's voice into another person's voice. Depending on whether the contents of the speech data from both source and target speakers are the same, there are two types of conversion, namely, parallel or non-parallel voice conversions. For parallel voice conversion, since the contents of the speech data from the two speakers are the same, alignment methods can be easily used to establish the correspondence between the speech data of the two speakers. When applying the same methods from parallel voice conversion to non-parallel voice conversion, the mapping of corresponding signal segments is not straightforward. Recently, we proposed to use a DNN-HMM (Hybrid Deep Neural Network - Hidden Markov Model) recognizer to label each frame of the speech data from both source and target speakers, and establish mapping by clustering the vector of pseudo-likelihood of each frame. The experiments showed that the method generates results that are comparable to parallel voice conversion method. In this work, we further study how the method works for different settings in the frame mapping process. Using an exemplar-based parallel method conversion method for testing, we compare our method with the state-of-the-art method INCA (An Iterative combination of a Nearest Neighbor search step and a Conversion step Alignment method). The experiments show that the proposed method generates results similar to those generated by INCA-based voice conversion.

查看译文

关键词

non-parallel voice conversion,frame mapping,dnn-hmm recognizer,clustering

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要