Evaluation of Automatic Speech Recognition Approaches.

Regis Pires Magalhães, Daniel Jean Rodrigues Vasconcelos, Guilherme Sales Fernandes,Lívia Almada Cruz, Matheus Xavier Sampaio,José Antônio Fernandes de Macêdo,Ticiana Linhares Coelho da Silva

Journal of Information and Data Management(2022)

引用 3|浏览2
暂无评分
摘要
Automatic Speech Recognition (ASR) is essential for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, Google Cloud Speech-to-Text, Wav2Vec, and AWS Transcribe. We performed the experiments with two real and public datasets, the Mozilla Common Voice and the Voxforge. The results demonstrate that the evaluated solutions slightly differ. However, Facebook Wit.ai outperforms the other analyzed approaches for the quality metrics collected like WER, BLEU, and METEOR. We also experiment to fine-tune Jasper Neural Network for ASR with four datasets different with no intersection to the ones we collect the quality metrics. We study the performance of the Jasper model for the two public datasets, comparing its results with the other pre-trained models.
更多
查看译文
关键词
automatic speech recognition,speech recognition,evaluation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要