Vocal Style Factorization for Effective Speaker Recognition in Affective Scenarios

2023 IEEE International Joint Conference on Biometrics (IJCB)(2023)

引用 0|浏览6
暂无评分
摘要
The accuracy of automated speaker recognition is negatively impacted by change in emotions in a person's speech. In this paper, we hypothesize that speaker identity is composed of various vocal style factors that may be learned from unlabeled data and re-combined using a neural network architecture to generate holistic speaker identity representations for affective scenarios. In this regard we propose the E-Vector architecture, composed of a 1-D CNN for learning speaker identity features and a vocal style factorization technique for determining vocal styles. Experiments conducted on the MSP-Podcast dataset demonstrate that the proposed architecture improves state-of-the-art speaker recognition accuracy in the affective domain over baseline ECAPA-TDNN speaker recognition models. For instance, the true match rate at a false match rate of 1% improves from 27.6% to 46.2%.
更多
查看译文
关键词
Speaker Recognition,Vocal Style,Affective Domain,Personal Emotions,False Matches,Loss Function,Training Data,Recent Literature,Bimodal,Validation Set,Negative Samples,Weight Vector,Trainable Parameters,Sample ID,Pair Of Sets,Matching Score,Audio Data,Pre-trained Weights,Speech Units,Speech Samples,Human Speech,Speech Synthesis,Equal Error Rate,Email Requests,Handcrafted Features,Attention Mechanism,Learning Styles,Area Under Curve
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要