Chrome Extension
WeChat Mini Program
Use on ChatGLM

Mawdoo3 AI at MADAR Shared Task: Arabic Tweet Dialect Identification

FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019)(2019)

Cited 3|Views12
No score
Abstract
Arabic dialect identification is an inherently complex problem, as Arabic dialect taxonomy is convoluted and aims to dissect a continuous space rather than a discrete one. In this work, we present machine and deep learning approaches to predict 21 fine-grained dialects form a set of given tweets per user. We adopted numerous feature extraction methods most of which showed improvement in the final model, such as word embedding, Tf-idf, and other tweet features. Our results show that a simple LinearSVC can outperform any complex deep learning model given a set of curated features. With a relatively complex user voting mechanism, we were able to achieve a Macro-Averaged F1-score of 71.84% on MADAR shared subtask-2. Our best submitted model ranked second out of all participating teams.
More
Translated text
Key words
Language Modeling,Social Media Analysis,Part-of-Speech Tagging,Statistical Machine Translation,Machine Learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined