Categorizing Error Causes Related to Utterance Characteristics in Speech Recognition

Jennifer Santoso, Takeshi Yamada,Shoji Makino

semanticscholar（2019）

引用 1|浏览4

暂无评分

摘要

Speech recognition systems are now widely used in our daily lives. However, sometimes speech recognition systems fail to recognize utterances. In many cases, users do not know what causes the failure, while the system asks users to repeat the utterance. When this situation continues, users consider that speech recognition systems are not user-friendly. The usability of speech recognition systems can be improved by specifying causes of error and presenting them in a way that users can easily understand, allowing them to improve the utterance. In this study, our aim is to categorize causes of error related to utterance characteristics occurring in daily-use speech recognition systems and present the feedback to users. Here, we focus on causes of error related to the utterance speed, such as ‘fast’, ‘slow’, ‘filler’, and ‘stuttered’, since they are easy for users to correct and frequently occur in natural speech. We propose a categorization method with bidirectional long short-term memory (BLSTM) as the categorization model. In this paper, we compare the Mel filter bank with that of the modulation spectrum as feature extraction methods. We perform an experiment in which it is decided whether a cause of error is present in a given utterance. The results indicate that our method using the modulation spectrum can reduce the number of false detections of causes of error related to the utterance speed, compared with the method using the Mel filter bank.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要