Improving Dnn Bluetooth Narrowband Acoustic Models By Cross-Bandwidth And Cross-Lingual Initialization

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION(2017)

引用 8|浏览40
暂无评分
摘要
The success of deep neural network (DNN) acoustic models is partly owed to large amounts of training data available for different applications. This work investigates ways to improve DNN acoustic models for Bluetooth narrowband mobile applications when relatively small amounts of in-domain training data are available. To address the challenge of limited in domain data, we use cross-bandwidth and cross-lingual transfer learning methods to leverage knowledge from other domains with more training data (different bandwidth and/or languages). Specifically, narrowband DNNs in a target language are initialized using the weights of DNNs trained on bandlimited wide band data in the same language or those trained on a different (resource-rich) language. We investigate multiple recipes involving such methods with different data resources. For all languages in our experiments, these recipes achieve up to 45% relative WER reduction, compared to training solely on the Blue tooth narrowband data in the target language. Furthermore, these recipes are very beneficial even when over two hundred hours of manually transcribed in-domain data is available, and we can achieve better accuracy than the baselines with as little as 20 hours of in-domain data.
更多
查看译文
关键词
speech recognition, deep neural network, cross lingual, cross-bandwidth, transfer learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要