A Statistical Approach to Classify Nationality of Name

msra(2007)

引用 23|浏览8
暂无评分
摘要
Name entities (NEs), especially personal names, are very important components in interpreting some kinds of text documents e.g. news. To extract personal names efficiently, statistical language models are required to denote characteristics of personal names. Among these characteristics, nationality of a name is a useful source for interpreting the text document. Automatically inferencing nationality from a name also directly assists a user to gain more information from the name. In this paper, we therefore propose a statistical approach to identify nationality of names written in Thai. Extracting features from decomposed personal names, their probabilistic bigram and tri-gram models are used with naive Bayesian classification to assign the most proper class for a name. To evaluate the proposed approach, a number of experiments are conducted on real-world data. The experimental results show that our approach works efficiently with about 94% accuracy.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要