Noisy Text Normalization Using an Enhanced Language Model

international conference on artificial intelligence（2014）

引用 0|浏览1

暂无评分

摘要

User generated text in social network sites contains enormous amount and vast variety of out-of-vocabulary words, formed both deliberately and mistakenly by the end-users. It is of essential usefulness to normalize the noisy text before employing NLP tasks. This paper describes an unsupervised normalization system, which encompasses two phases: candidate generation and candidate selection. We generate candidate via six different methods: 1) one-edit distance lexically generation, 2) phonemically generation, 3) blending the previous methods, 4) two-edit distance lexically generation, 5) dictionary translation, and 6) heuristic rules. Although in candidate selection we use a trigram language model, a new method presented to select candidates with respect to all other words in the sentence. Our experiments on a large dataset show promising results.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要