Quantifying Gender Bias in Arabic Pre-Trained Language Models.

IEEE Access(2024)

Cited 0|Views1
No score
Abstract
The current renaissance in the development of Arabic Pre-trained Language models (APLMs) has yielded significant advancement across many fields. Nevertheless, no study has explored the dimensions of gender bias in these models. It is argued that the bias is influenced by the resources used during the models’ pre-training process. Thus, in this study, we conducted a comprehensive analysis to qualitatively assess the representation of different genders by tracing the bias signals from the training corpus. Through applying several Natural Language Processing (NLP) techniques, including Named Entity Recognition (NER), Part of Speech Tagging (POS), and Dependency Parsing (DP), the results indicated an imbalanced corpus in terms of gender nouns and reveal verbs’ patterns associated with each gender. The second phase of this study aimed to examine the impact of the results that emerged from the corpus analysis on the recent APLMs. Leveraging Bidirectional Encoder Representations (BERT)’s ability to predict the missing tokens in quantifying gender bias, we introduce the first template-based Arabic benchmark designed to measure gender bias across various disciplines. Utilizing this benchmark, along with the list of gender-specific nouns and personal names extracted from the corpus, we evaluated the gender skew in the context of scientific and liberal arts disciplines across six APLMs. These models included: AraBERT, CAMeLBERT-CA, CAMeLBERT-MSA, GigaBERT, MAR-BERT, and ARBERT. The outcomes revealed a higher bias skew toward personal names, indicating that the presence of gender associations in the training corpus reinforced gender bias in APLMs.
More
Translated text
Key words
Arabic Pretrained Language Models (APLMs),BERT,gender bias,large models,quantifying bias
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined