Empirical Exploration Of Novel Architectures And Objectives For Language Models

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION(2017)

引用 7|浏览93
暂无评分
摘要
While recurrent neural network language models based on Long Short Term Memory (LSTM) have shown good gains in many automatic speech recognition tasks, Convolutional Neural Network (CNN) language models are relatively new and have not been studied in-depth. In this paper we present an empirical comparison of LSTM and CNN language models on English broadcast news and various conversational telephone speech transcription tasks. We also present a new type of CNN language model that leverages dilated causal convolution to efficiently exploit long range history. We propose a novel criterion for training language models that combines word and class prediction in a multi-task leaming framework. We apply this criterion to train word and character based LSTM language models and CNN language models and show that it improves performance. Our results also show that CNN and LSTM language models are complementary and can be combined to obtain further gains.
更多
查看译文
关键词
Language model, LSTM, CNN, dilated causal convolution, multi-task learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要