Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 1|浏览0
暂无评分
摘要
End-to-end automatic speech recognition (ASR) systems have gained popularity given their simplified architecture and promising results. However, text-only domain adaptation remains a big challenge for E2E systems. Text-to-speech (TTS) based approaches fine-tune ASR models by synthesized speech with an auxiliary TTS model, thus increase deployment costs. Language model (LM) fusion based approaches can achieve good performance but are sensitive to interpolation parameters. In order to factorize out the language component in the AED model, we propose the factorized attention-based encoder-decoder (Factorized AED) model whose decoder takes as input the posterior probabilities of a jointly trained LM. Moreover, in the context of domain adaptation, the domain specific LM serves as a plug-and-play component for a well-trained factorized AED model. In-domain experiments on LibriSpeech and out-of-domain experiments adapting from LibriSpeech to a variety of domains in GigaSpeech are conducted to validate the effectiveness of our proposed methods. Results show 20% / 24% relative word error rate (WER) reduction for LibriSpeech test sets and 8 ∼34% relative WER reduction for 8 GigaSpeech target domains test sets compared to the AED baseline.
更多
查看译文
关键词
text-only,domain adaptation,factorized AED,end-to-end speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要