Generating a Standardized Dataset: Gurmukhi Offline Handwritten Collection of Tehsil and Sub- Tehsil names from Punjab

2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)(2024)

引用 0|浏览6
暂无评分
摘要
In recent times, there has been considerable focus on researching the interpretation of handwritten documents in Indian languages. Handwriting recognition involves the ability of computers to convert human handwriting into machine-readable text. One significant challenge is the lack of a consistent library of handwritten texts in Indian languages, which is essential for assessing the performance of various document recognition algorithms and for making comparisons among them. However, due to the limited availability of Gurmukhi script data in the public domain, conducting a structured evaluation of techniques for recognizing Gurmukhi tehsil and sub-tehsil names is not feasible. To address this gap, this paper presents the construction of an unconstrained Gurmukhi handwritten words database (GHWD). The GHWD comprises 62,000 handwritten words authored by 40 distinct editors, encompassing 77 tehsil and 78 sub-tehsil names. During the data generation process, each editor wrote every Gurmukhi word ten times. Editors were selected from diverse backgrounds and age groups to ensure a varied and representative dataset.
更多
查看译文
关键词
Handwriting Recognition,Tehsil classification,Sub- Tehsil Classification,Gurmukhi Dataset,Gurmukhi word,benchmarking,Transfer Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要