Improving Phoneme Recognition with Augmented Autoregressive Predictive Coding

Asad Ullah,Alessandro Ragano,Andrew Hines

2023 34TH IRISH SIGNALS AND SYSTEMS CONFERENCE, ISSC（2023）

引用 0|浏览8

暂无评分

摘要

Self-supervised learning (SSL) has improved the performance in various speech processing downstream tasks. Pretraining speech representations with SSL models requires large datasets which can be hard to collect for low resource language scenarios. To address this problem we propose to use audio augmentation to pre-train the SSL model with a smaller dataset but augmented. We illustrate our technique using autoregressive predictive coding (APC) by using augmentation effects directly at an audio level in order to generate synthetic training data. As a result, we aim to increase robustness in trained SSL model. We apply two kinds of audio augmentation, speed perturbation and speed perturbation with pitch change. We then evaluate our pre-trained APC model on a downstream phoneme recognition task. From our results and analysis, we conclude that applying augmentation in a low resource configurations can improve the robustness in the pre-trained model resulting in improved phoneme classification accuracy for a downstream task.

查看译文

关键词

self supervised speech representation,audio augmentation,speed perturbation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要