Muti-modal Emotion Recognition via Hierarchical Knowledge Distillation

IEEE Transactions on Multimedia(2024)

引用 0|浏览2
暂无评分
摘要
Due to its wide applications, multimodal emotion recognition has gained increasing research attention. Although existing methods have achieved compelling success with various multimodal fusion methods, they overlook that the dominated modality (e.g., text) may cause a shortcut and hence negatively affect the representation learning of other modalities (e.g., image and audio). To alleviate such a problem, we resort to the knowledge distillation to narrow the gap between different modalities. In particular, we develop a new hierarchical knowledge distillation model for multi-modal emotion recognition (HKD-MER), consisting of three components, feature extraction, hierarchical knowledge distillation, and attentive multi-modal fusion. As the major contribution in our proposed model, the hierarchical knowledge distillation is designed to transfer the knowledge from the dominant modality to the others at both the feature and label levels. It boosts the performance of non-dominated modalities by modeling the inter-modal relation between different modalities. We have justified the effectiveness of our proposed model over two benchmark datasets.
更多
查看译文
关键词
Emotion Recognition,Multi-modal Representation Learning,Knowledge Distillation,Contrastive Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要