Transducer-Based Streaming Deliberation for Cascaded Encoders
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)(2022)
摘要
Previous research on applying deliberation networks to automatic speech recognition has achieved excellent results. The attention decoder based deliberation model often works as a rescorer to improve first-pass recognition results, and requires the full first-pass hypothesis for second-pass deliberation. In this work, we propose a transducer-based streaming deliberation model. The joint network of a transducer decoder often receives inputs from the encoder and the prediction network. We propose to use attention to the first-pass text hypothesis as the third input to the joint network. The proposed transducer based deliberation model naturally streams, making it more desirable for on-device applications. We also show that the model improves rare word recognition compared to cascaded encoders, with relative WER reductions ranging from 3.6% to 10.4% for a variety of test sets. Our model does not use any additional text data for training.
更多查看译文
关键词
transducer decoder,encoder,prediction network,first-pass text hypothesis,joint network,transducer based deliberation model naturally streams,rare word recognition,cascaded encoders,deliberation networks,automatic speech recognition,attention decoder based deliberation model,first-pass hypothesis,second-pass deliberation,transducer-based streaming deliberation model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要