Chrome Extension
WeChat Mini Program
Use on ChatGLM

Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition

Conference of the International Speech Communication Association(2023)

Cited 0|Views24
No score
Abstract
One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched. In this paper, we propose an on-the-fly random utterance concatenation (RUC) based data augmentation method to alleviate train-test utterance length mismatch issue for short-video ASR task. Specifically, we are motivated by observations that our human-transcribed training utterances tend to be much shorter for short-video spontaneous speech ( 3 seconds on average), while our test utterance generated from voice activity detection front-end is much longer ( 10 seconds on average). Such a mismatch can lead to suboptimal performance. Empirically, it's observed the proposed RUC method significantly improves long utterance recognition without performance drop on short one. Overall, it achieves 5.72 languages and improved robustness to various utterance length.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined