Improved Data2vec with Soft Supervised Hidden Unit for Mandarin Speech Recognition

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC(2023)

引用 0|浏览8
暂无评分
摘要
Speech pre-training method has shown great success in learning useful and general latent representations from large-scale unlabeled data. In order to further improve the performance of self-supervised learning method for specific downstream tasks, an improved approach based on data2vec framework with soft supervised hidden unit is proposed. To take full advantage of the labeled data from downstream task, a supervised model is firstly trained to extract supervised hidden unit. And then based on data2vec, an extra Bert-like prediction task with soft cluster distance is introduced to match the downstream task and avoid unnecessary information loss. The proposed method can form a virtuous circulating utilization pattern for the downstream labeld data. Experiments on the small open-source Mandarin speech corpus AISHELL-2 and large private-source Mandarin speech corpus TRANS-L tasks show that our method can achieve relative character error rate reductions of 13.2% and 5.2% respectively when pre-trained on AISHELL-2 and TRANS-L corpus compared with data2vec framework.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要