RED v2: Enhancing RED Dataset for Multi-Label Emotion Detection.

Alexandra Ciobotaru, Mihai Vlad Constantinescu,Liviu P. Dinu,Stefan Dumitrescu

International Conference on Language Resources and Evaluation (LREC)（2022）

引用 0|浏览1

暂无评分

摘要

RED (Romanian Emotion Dataset) is a machine learning-based resource developed for the automatic detection of emotions in Romanian texts, containing single-label annotated tweets with one of the following emotions: joy, fear, sadness, anger and neutral. In this work, we propose REDv2, an open-source extension of RED by adding two more emotions, trust and surprise, and by widening the annotation schema so that the resulted novel dataset is multi-label. We show the overall reliability of our dataset by computing inter-annotator agreements per tweet using a formula suitable for our annotation setup and we aggregate all annotators' opinions into two variants of ground truth, one suitable for multi-label classification and the other suitable for text regression. We propose strong baselines with two transformer models, the Romanian BERT and the multilingual XLM-Roberta model, in both categorical and regression settings.

查看译文

关键词

emotion detection, multi-label classification, text regression, Romanian tweets

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要