Residual Adapters for Targeted Updates in RNN-Transducer Based Speech Recognition System

Sungjun Han,Deepak Baby,Valentin Mendelev

2022 IEEE Spoken Language Technology Workshop (SLT)（2023）

引用 1|浏览4

暂无评分

摘要

This paper investigates an approach for adapting RNN-Transducer (RNN-T) based automatic speech recognition (ASR) model to improve the recognition of unseen words during training. Prior works have shown that it is possible to incrementally finetune the ASR model to recognize multiple sets of new words. However, this creates a dependency between the updates which is not ideal for the hot-fixing use-case where we want each update to be applied independently of other updates. We propose to train residual adapters on the RNN-T model and combine them on-the-fly through adapter-fusion. We investigate several approaches to combine the adapters so that they maintain the ability to recognize new words with only a minimal degradation on the usual user requests. Specifically, the sum-fusion which sums the outputs of the adapters inserted in parallel shows over 90% recall on the new words with less than 1% relative WER degradation on the usual data compared to the original RNN-T model.

查看译文

关键词

speech recognition,end-to-end models,RNN-T,incremental learning,residual adapters

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要