Improved Hybrid Streaming ASR with Transformer Language Models.

Pau Baquero-Arnal,Javier Jorge,Adrià Giménez,Joan Albert Silvestre-Cerdà,Javier Iranzo-Sánchez,Albert Sanchís,Jorge Civera,Alfons Juan

INTERSPEECH（2020）

引用 10|浏览21

暂无评分

摘要

Streaming ASR is gaining momentum due to its wide applicability, though it is still unclear how best to come close to the accuracy of state-of-the-art off-line ASR systems when the output must come within a short delay after the incoming audio stream. Following our previous work on streaming one-pass decoding with hybrid ASR systems and LSTM language models, in this work we report further improvements by replacing LSTMs with Transformer models. First, two key ideas are discussed so as to run these models fast during inference. Then, empirical results on LibriSpeech and TED-LIUM are provided showing that Transformer language models lead to improved recognition rates on both tasks. ASR systems obtained in this work can be seamlessly transfered to a streaming setup with minimal quality losses. Indeed, to the best of our knowledge, no better results have been reported on these tasks when assessed under a streaming setup.

查看译文

关键词

streaming, hybrid ASR, language models, Transformer

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要