谷歌浏览器插件
订阅小程序
在清言上使用

Gender Classification with Data Independent Features in Multiple Languages.

2017 European Intelligence and Security Informatics Conference (EISIC)(2017)

引用 6|浏览5
暂无评分
摘要
Gender classification is a well-researched problem, and state-of-the-art implementations achieve an accuracy of over 85%. However, most previous work has focused on gender classification of texts written in the English language, and in many cases, the results cannot be transferred to different datasets since the features used to train the machine learning models are dependent on the data. In this work, we investigate the possibilities to classify the gender of an author on five different languages: English, Swedish, French, Spanish, and Russian. We use features of the word counting program Linguistic Inquiry and Word Count (LIWC) with the benefit that these features are independent of the dataset. Our results show that by using machine learning with features from LIWC, we can obtain an accuracy of 79% and 73% depending on the language. We also, show some interesting differences between the uses of certain categories among the genders in different languages.
更多
查看译文
关键词
gender,classification,domain independence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要