Bangla Speaker Accent Variation Classification from Audio Using Deep Neural Networks: A Distinct Approach.

Khorshed Alam, Mahbubul Haq Bhuiyan,Md Fahad Monir

TENCON 2023 - 2023 IEEE Region 10 Conference (TENCON)(2023)

引用 0|浏览0
暂无评分
摘要
Accent Variation Classification is the technique of detecting an accent or dialect of a human speech based on speech patterns and features from speech. This is useful in developing speech recognition systems, language learning systems, dialect preservation systems, sociolinguistic studies, voice assistance, improving speech synthesis and voiceover systems. It can be used in conducting forensic analysis on audio data to determine regional origin or specific accent traits. Furthermore, it is a useful tool in criminal investigations and judicial actions. Deep Neural Networks (DNNs) are utilized for speech recognition tasks because they can successfully learn complex variables of speech input such as patterns, intensity, rhythm, and temporal information. In this study, we propose Zero Crossing Rate (ZCR), Mel Frequency Cepstral Coefficients (MFCC), Root Mean Square (RMS), Mel-Spectrogram based feature extraction and DNN based Bangla Speaker Accent Variation Classification model to classify the speaker's variation from Bangla Speech data. We train our model with 7443 audios from 9303 audios (Formal, Dhaka, Khulna, Barisal, Rajshahi, Sylhet, Chittagong, Mymensingh and Noakhali) and our model achieves 94 % accuracy from unseen or new data. We compare its accuracy and performance with other neural networks where LSTM, Stacked LSTM and DCNN achieve accuracy of 67%, 71 % and 85% respectively.
更多
查看译文
关键词
Accent classification,Speech recognition,Filtering,Deep neural network,Human Voice,Accent variation classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要