Noisy Text Normalization Using an Enhanced Language Model

international conference on artificial intelligence(2014)

引用 0|浏览1
暂无评分
摘要
User generated text in social network sites contains enormous amount and vast variety of out-of-vocabulary words, formed both deliberately and mistakenly by the end-users. It is of essential usefulness to normalize the noisy text before employing NLP tasks. This paper describes an unsupervised normalization system, which encompasses two phases: candidate generation and candidate selection. We generate candidate via six different methods: 1) one-edit distance lexically generation, 2) phonemically generation, 3) blending the previous methods, 4) two-edit distance lexically generation, 5) dictionary translation, and 6) heuristic rules. Although in candidate selection we use a trigram language model, a new method presented to select candidates with respect to all other words in the sentence. Our experiments on a large dataset show promising results.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要