Disentanglement of Speaker Identity for Accented Speech Recognition
2023 8th International Conference on Communication, Image and Signal Processing (CCISP)(2023)
摘要
Automatic speech recognition (ASR) systems have achieved good performance on standard English speech. However, recognition accuracy significantly degrades for non-native speakers with accented speech. Existing multi-accent ASR models utilize accent embeddings as extra input to handle diverse accents, but these embeddings often intertwine other speech attributes, impeding model performance. In this paper, we propose a speaker identity disentangled accent modeling approach, which extracts the accent embedding without any speaker-specific information, and uses it to improve the performance of the multi-accent speech recognition systems. Experiments conducted on the CommonVoice dataset demonstrate that our proposed method attains a 15% reduction in word error rate (WER) over baselines.
更多查看译文
关键词
accented speech recognition,multi-accent,speaker identity,disentanglement
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要