Recognition Of Emotional Speech With Convolutional Neural Networks By Means Of Spectral Estimates

2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW)(2017)

引用 32|浏览10
暂无评分
摘要
Current developments in deep neural architectures achieved remarkable results in the classification of emotions from speech. Recently, also cross-modal approaches gained attention in the community. Such a classification method is the Convolutional Neural Network (CNN). Mainly developed for analyses of images it can be used also in speech processing. In this paper, we present a CNN-based classification architecture adapting spectrograms as representations of emotion-afflicted speech input. Given this approach, we applied our network architecture to three benchmark corpora, namely EmoDB, eNTERFACE, and SUSAS, and investigated the classification ability in a Leave-One-Speaker-Out setting. Especially, for SUSAS, a close-to-real-life corpus, remarkable results were obtained. In addition, we investigated the option of analysing CNN's internal representations of the given input using Deep Dreaming. For this, we were able to identify spectral parts which contribute most to the classification process.
更多
查看译文
关键词
cross-modal approaches,classification method,speech processing,emotion-afflicted speech input,network architecture,SUSAS,classification ability,classification process,convolutional neural networks,spectral estimates,deep neural architectures,convolutional neural network,emotional speech recognition,emotions classification,CNN-based classification architecture,EmoDB,eNTERFACE,CNN internal representations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要