Improving mispronunciation detection using speech reconstruction

IEEE/ACM Transactions on Audio, Speech, and Language Processing（2024）

引用 0|浏览0

暂无评分

摘要

Training related machine learning tasks simultaneously can lead to improved performance on both tasks. Textto- speech (TTS) and mispronunciation detection and diagnosis (MDD) both operate using phonetic information and we wanted to examine whether a boost in MDD performance can be obtained by combining the two tasks. We propose a network that reconstructes speech from the phones produced by the MDD system and computes a speech reconstruction loss. We hypothesize that the phones produced by the MDD system will be closer to the ground truth if the reconstructed speech sounds closer to the original speech. To test this, we first extract wav2vec features from a pre-trained model and feed it to the MDD system along with the text input. The MDD system then predicts the target annotated phones and then synthesizes speech from the predicted phones. The system is therefore trained by computing both a speech reconstruction loss as well as an MDD loss. Comparing the proposed system against an identical system but without speech reconstruction and another state-of-the-art baseline, we found that the proposed system achieves higher mispronunciation detection and diagnosis (MDD) scores. On a set of sentences unseen during training, the proposed system achieves higher MDD scores, which suggests that reconstructing the speech signal from the predicted phones helps the system generalize to new test sentences. We also tested whether the system can generate accented speech when the input phones have mispronunciations. Results from our perceptual experiments show that speech generated from phones containing mispronunciations sounds more accented and less intelligible than phones without any mispronunciations, which suggests that the system can identify differences in phones and generate the desired speech signal.

查看译文

关键词

Multi-task learning,mispronunciation detection,text-to-speech,speech reconstruction

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要