Categorizing Error Causes Related to Utterance Characteristics in Speech Recognition


引用 1|浏览4
Speech recognition systems are now widely used in our daily lives. However, sometimes speech recognition systems fail to recognize utterances. In many cases, users do not know what causes the failure, while the system asks users to repeat the utterance. When this situation continues, users consider that speech recognition systems are not user-friendly. The usability of speech recognition systems can be improved by specifying causes of error and presenting them in a way that users can easily understand, allowing them to improve the utterance. In this study, our aim is to categorize causes of error related to utterance characteristics occurring in daily-use speech recognition systems and present the feedback to users. Here, we focus on causes of error related to the utterance speed, such as ‘fast’, ‘slow’, ‘filler’, and ‘stuttered’, since they are easy for users to correct and frequently occur in natural speech. We propose a categorization method with bidirectional long short-term memory (BLSTM) as the categorization model. In this paper, we compare the Mel filter bank with that of the modulation spectrum as feature extraction methods. We perform an experiment in which it is decided whether a cause of error is present in a given utterance. The results indicate that our method using the modulation spectrum can reduce the number of false detections of causes of error related to the utterance speed, compared with the method using the Mel filter bank.
AI 理解论文
Chat Paper