Chrome Extension
WeChat Mini Program
Use on ChatGLM

Sense Unveiled: Enhancing Urdu Corpus for Nuanced Word Sense Disambiguation

IEEE Access(2024)

Cited 0|Views0
No score
Abstract
Ambiguity in word meanings presents a significant challenge in natural language processing, necessitating robust techniques for Word Sense Disambiguation (WSD). While research in WSD has predominantly focused on widely spoken languages like English and Spanish, less attention has been given to languages such as Urdu. This paper addresses this gap by conducting a thorough examination of existing corpora for WSD in Urdu and presenting the creation of an Enhanced Urdu (EU) corpus specifically tailored for WSD tasks. The analysis encompasses a critical evaluation of the limitations of ULS-WSD-18 Corpus, and justifies the need for a more comprehensive resource. The EU corpus is meticulously curated, comprising 960 words categorized based on their frequency in the corpus into most frequent, moderate, and infrequent words. These words serve as the foundation for constructing sentences utilized in model training and testing. Various similarity coefficients are employed to assess the similarity between the EU corpus and the ULS-WSD-18 Corpus, revealing notable patterns in word occurrences, sense structures, and sentence compositions. The findings underscore the potential of the EU corpus to advance WSD research in Urdu language processing. By providing a comprehensive resource for model development and evaluation, this work contributes to the broader goal of improving language processing tools for Urdu and other underrepresented languages.
More
Translated text
Key words
Word sense disambiguation,natural language processing,machine learning,sense tagged Urdu corpus
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined