
Author Profiling in Code-Mixed WhatsApp Messages Using Stacked Convolution Networks and Contextualized Embedding Based Text Augmentation.

Neural processing letters/Neural Processing Letters(2022)

引用 8|浏览7
The increasing use of social media to communicate is an emerging trend compared to traditional phone calls and SMS. WhatsApp is one of the popular social messaging applications used in India. Identification of demographic features of authors in social media is known as author profiling. Author profiling is helpful for many applications such as forensics, security and marketing. Author profiling helps to identify fake profiles in social media. By analysing their WhatsApp messages in code-mixed Tamil, this paper focuses on identifying the socio-demographic appearance of author traits or features such as gender, age-group, marital and education status. Even though many studies have been conducted on Author Profiling in English and other resources-rich languages, the research on the Indian language is still nascent. This study is the first Author Profiling task for code-mixed Tamil on WhatsApp. As a part of this study, we have created the benchmark WhatsApp dataset in code-mixed Tamil language to develop the author profiling system. We propose a stacked Convolutional Network (CNN) combined with k-max pooling and Bidirectional Long Short Term Memory (BiLSTM) to enhance the classification performance of CNN. Multiple experiments have been conducted to demonstrate the effectiveness of the proposed model, including the comparison against existing models with diverse parameter settings. We have also incorporated Focal loss and context embedding based data augmentation to handle the data imbalance. The proposed model outperforms state-of-the-art deep learning models with a better performance.
Author profiling,WhatsApp messages,Code-mixed Tamil,Stacked CNN,K-max pooling
AI 理解论文
Chat Paper