A Comparison of Supervised and Unsupervised Pre-Training of End-to-End Models.

Ananya Misra,Dongseong Hwang,Zhouyuan Huo,Shefali Garg,Nikhil Siddhartha,Arun Narayanan,Khe Chai Sim

Interspeech（2021）

引用 8|浏览20

暂无评分

摘要

In the absence of large-scale in-domain supervised training data, ASR models can achieve reasonable performance through pre-training on additional data that is unlabeled, mismatched or both. Given such data constraints, we compare pre-training end-to-end models on matched but unlabeled data (unsupervised) and on labeled but mismatched data (supervised), where the labeled data is mismatched in either domain or language. Across encoder architectures, pre-training methods and languages, our experiments indicate that both types of pre-training improve performance, with relative WER reductions of 15-30% in the domain mismatch case and up to 15% in the language mismatch condition. We further find that the advantage from unsupervised pre-training is most prominent when there is no matched and labeled fine-tuning data, provided that a sufficient amount of mismatched data is still available for supervised fine-tuning.

查看译文

关键词

Computer science,Machine learning,Training (meteorology),End-to-end principle,Artificial intelligence

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要