Context-Sensitive Multimodal Emotion Recognition From Speech And Facial Expression Using Bidirectional Lstm Modeling
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4(2010)
摘要
In this paper, we apply a context-sensitive technique for multimodal emotion recognition based on feature-level fusion of acoustic and visual cues. We use bidirectional Long Short-Term Memory (BLSTM) networks which, unlike most other emotion recognition approaches, exploit long-range contextual information for modeling the evolution of emotion within a conversation. We focus on recognizing dimensional emotional labels, which enables us to classify both prototypical and non-prototypical emotional expressions contained in a large audiovisual database. Subject-independent experiments on various classification tasks reveal that the BLSTM network approach generally prevails over standard classification techniques such as Hidden Markov Models or Support Vector Machines, and achieves F1-measures of the order of 72%, 65%, and 55% for the discrimination of three clusters in emotional space and the distinction between three levels of valence and activation, respectively.
更多查看译文
关键词
emotion recognition,multimodality,long short-term memory,hidden markov models,context modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络