Text Augmentation Techniques for Document Vector Generation from Russian News Articles.

Communications in Computer and Information Science(2018)

引用 0|浏览10
暂无评分
摘要
In this paper, a document classification system is enhanced through the construction of a text augmentation technique by testing various Part-of-Speech filters and word vector weighting methods with nine different models for document representation. Subject/object tagging is introduced as a new form of text augmentation, along with a novel classification system grounded in a word weighting method based on the distribution of words among classes of documents. When an augmentation including subject/object tagging, a nouns+adjectives filter and Inverse Document Frequency word weighting was applied, an average increase in classification accuracy of 4.1% points was observed.
更多
查看译文
关键词
Natural language processing,Classification algorithms,Data preprocessing,Text processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要