Mask-Conformer: Augmenting Conformer with Mask-Predict Decoder.
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)(2023)
摘要
Much of the recent progress in automatic speech recognition (ASR) lies in developing an acoustic encoder, such as enlarging its capacity and designing a refined architecture for speech processing. With these highly optimized encoders, the decoder has become less influential in its role as a language model (LM). In this work, we explore an effective approach for employing the LM structure in an ASR model. The proposed Mask-Conformer augments a Conformer-based model with a mask-predict decoder, which learns output context via the masked LM objective. The mask-predict decoder is applied to stacks of encoder layers, where the decoder output explicitly conditions the subsequent layers using cross-attention. We also propose a fill-mask decoding algorithm that refines a sequence using the decoder’s linguistic information. Experimental results show that Mask-Conformer outperforms strong baselines on some tasks. In addition, our analyses validate the effectiveness of the proposed model design.
更多查看译文
关键词
Conformer,decoder network,mask-predict,end-to-end speech recognition,deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要