Improving Named Entity Recognition Of English And Vietnamese Languages Using Bilingual Constraints

Thinh Truong,An Dao,Long Nguyen,Dien Dinh

PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL (NLPIR 2018)（2018）

引用 0|浏览12

暂无评分

摘要

Named entity recognition plays a crucial role in many Natural Language Processing tasks because the semantic information is carried by entities. The recent efforts are trying to reduce the annotation labor because the state-of-the-art Named Entity Recognition systems are still based on supervised machine learning algorithms that require huge amounts of training data. Such training data are difficult and expensive to produce manually. In particular, Vietnamese is a resource-limited language which lacks high-quality named entity annotated corpora. This limitation leads to the low performance of Vietnamese Named Entity Recognition. Therefore, in this paper, thanks to the use of an existing unannotated English-Vietnamese bilingual corpus, we propose an approach to improve Named Entity Recognit ion systems of both English and Vietnamese languages. Experimental results show an improvement of both English and Vietnamese Named Entity Recognition compared to the strong baseline StanfordNER. In particular, Vietnamese Named Entity Recognition improves significantly by 18.45% in term of F-1-score. As for the English side, F-1-score improves from 92.44% to 95.05%. Our proposed method can also be generalized to apply to other resource-limited languages.

查看译文

关键词

Named entity recognition, Bilingual text, Word alignment

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要