Mask-Conformer: Augmenting Conformer with Mask-Predict Decoder.

Yosuke Higuchi,Andrew Rosenberg, Yuan Wang,Murali Karthick Baskar,Bhuvana Ramabhadran

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)（2023）

引用 0|浏览1

暂无评分

摘要

Much of the recent progress in automatic speech recognition (ASR) lies in developing an acoustic encoder, such as enlarging its capacity and designing a refined architecture for speech processing. With these highly optimized encoders, the decoder has become less influential in its role as a language model (LM). In this work, we explore an effective approach for employing the LM structure in an ASR model. The proposed Mask-Conformer augments a Conformer-based model with a mask-predict decoder, which learns output context via the masked LM objective. The mask-predict decoder is applied to stacks of encoder layers, where the decoder output explicitly conditions the subsequent layers using cross-attention. We also propose a fill-mask decoding algorithm that refines a sequence using the decoder’s linguistic information. Experimental results show that Mask-Conformer outperforms strong baselines on some tasks. In addition, our analyses validate the effectiveness of the proposed model design.

查看译文

关键词

Conformer,decoder network,mask-predict,end-to-end speech recognition,deep learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要