Mask-Conformer: Augmenting Conformer with Mask-Predict Decoder.

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)(2023)

引用 0|浏览1
暂无评分
摘要
Much of the recent progress in automatic speech recognition (ASR) lies in developing an acoustic encoder, such as enlarging its capacity and designing a refined architecture for speech processing. With these highly optimized encoders, the decoder has become less influential in its role as a language model (LM). In this work, we explore an effective approach for employing the LM structure in an ASR model. The proposed Mask-Conformer augments a Conformer-based model with a mask-predict decoder, which learns output context via the masked LM objective. The mask-predict decoder is applied to stacks of encoder layers, where the decoder output explicitly conditions the subsequent layers using cross-attention. We also propose a fill-mask decoding algorithm that refines a sequence using the decoder’s linguistic information. Experimental results show that Mask-Conformer outperforms strong baselines on some tasks. In addition, our analyses validate the effectiveness of the proposed model design.
更多
查看译文
关键词
Conformer,decoder network,mask-predict,end-to-end speech recognition,deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要