Cross Modal Training for ASR Error Correction with Contrastive Learning

Jin Jiang,Xiaojun Wan, Wei Peng, Rongjun Li,Jingyuan Yang, Yanquan Zhou

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览0
暂无评分
摘要
ASR Error Correction (AEC) aims to post-process the output of ASR systems and further reduce the word error rate. In this paper, we propose a cross-modal training framework with contrastive learning on the AEC task. This framework enables a shared encoder-decoder model to learn text, pinyin (phoneme 1 ) and audio information simultaneously, which is trained by three subtasks: text correction, pinyin to text and ASR. On this basis, we introduce contrastive learning loss to shrink the distance between the three modalities and construct a unified representation. Experiments 2 on four AEC datasets show that our method effectively corrects a large number of ASR errors to state-of-the-art levels.
更多
查看译文
关键词
ASR Correction,Cross Modal Training,Contrastive Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要