Punctuation Prediction in Vietnamese ASRs Using Transformer-Based Models.

PRICAI（2021）

引用 1|浏览5

暂无评分

摘要

Punctuation prediction is the task of predicting and inserting punctuation like periods, commas, exclamation marks, etc. into the appropriate positions in transcribed texts in ASR systems. This helps to improve user readability and the performance of many downstream tasks. While most related studies have been performed for popular languages like English and Chinese, there is very little work done for low-resource languages. In order to stimulate the research on these languages, in this paper, we target to improve the quality of punctuation prediction for Vietnamese ASRs. Specifically, we propose a method based on recent advances on pre-trained language models (LMs) for general purposes such as BERT and ELECTRA. The benefit of using these models is that they can be effectively fine-tuned on this punctuation prediction task where only a small amount of training data is available. To further enhance the performance, a simple yet effective technique to provide more context information in predicting punctuation marks for the very left and right words in each segment is also proposed. The experimental results of the proposed model on public benchmark datasets are quite promising. Overall, the proposed architecture substantially enhanced the prediction performance by a large margin and yielded a new state-of-the-art result on these datasets. Specifically, we achieved the F 1 scores of 71.49% and 80.38% on the Novel and Newspaper public datasets, respectively.

查看译文

关键词

Punctuation prediction, Vietnamese ASR, viBERT, vELECTRA

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要