Korean Grapheme Unit-based Speech Recognition Using Attention-CTC Ensemble Network

Hosung Park,Soonshin Seo, Daniel Jun Rim,Changmin Kim,Hyunsoo Son,Jeong-Sik Park,Ji-Hwan Kim

2019 International Symposium on Multimedia and Communication Technology (ISMAC)（2019）

引用 4|浏览53

暂无评分

摘要

This study proposes an end-to-end speech recognition method based on the Attention-CTC ensemble network that uses Korean graphemes as recognition units. End-to-end speech recognition is a method that allows the processing of procedures that involved a number of modules, including the DNN-HMM-based acoustic model, the N-gram-based language model, and the WFST-based decoding network, with a single DNN network. To predict the outputs of the end-to-end model, this study utilizes grapheme-unit output structures. Building a network based on graphemes enables effective learning by reducing the number of output parameters to be predicted from 11,172 to 49. Towards this aim, this study designed an end-to-end model by combining the connectionist temporal classification (CTC), the DNN network structure primarily used in end-to-end learning, and the attention network model. The experiment resulted in a 10.5% syllable error rate.

查看译文

关键词

Attention network,Connectionist temporal classification,Speech recognition,Deep neural network

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要