Transducer-Based Streaming Deliberation for Cascaded Encoders

Ke Hu,Tara N. Sainath,Arun Narayanan,Ruoming Pang,Trevor Strohman

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)（2022）

引用 6|浏览93

暂无评分

摘要

Previous research on applying deliberation networks to automatic speech recognition has achieved excellent results. The attention decoder based deliberation model often works as a rescorer to improve first-pass recognition results, and requires the full first-pass hypothesis for second-pass deliberation. In this work, we propose a transducer-based streaming deliberation model. The joint network of a transducer decoder often receives inputs from the encoder and the prediction network. We propose to use attention to the first-pass text hypothesis as the third input to the joint network. The proposed transducer based deliberation model naturally streams, making it more desirable for on-device applications. We also show that the model improves rare word recognition compared to cascaded encoders, with relative WER reductions ranging from 3.6% to 10.4% for a variety of test sets. Our model does not use any additional text data for training.

查看译文

关键词

transducer decoder,encoder,prediction network,first-pass text hypothesis,joint network,transducer based deliberation model naturally streams,rare word recognition,cascaded encoders,deliberation networks,automatic speech recognition,attention decoder based deliberation model,first-pass hypothesis,second-pass deliberation,transducer-based streaming deliberation model

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要