Cross Modal Training for ASR Error Correction with Contrastive Learning

Jin Jiang,Xiaojun Wan, Wei Peng, Rongjun Li,Jingyuan Yang, Yanquan Zhou

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2024）

引用 0|浏览0

暂无评分

摘要

ASR Error Correction (AEC) aims to post-process the output of ASR systems and further reduce the word error rate. In this paper, we propose a cross-modal training framework with contrastive learning on the AEC task. This framework enables a shared encoder-decoder model to learn text, pinyin (phoneme ¹ ) and audio information simultaneously, which is trained by three subtasks: text correction, pinyin to text and ASR. On this basis, we introduce contrastive learning loss to shrink the distance between the three modalities and construct a unified representation. Experiments ² on four AEC datasets show that our method effectively corrects a large number of ASR errors to state-of-the-art levels.

查看译文

关键词

ASR Correction,Cross Modal Training,Contrastive Learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要