Creating Corpora for Seq2Seq Tone Rephrasing Using Social Media Posts

Paulo Cavalin,Marisa Vasconcelos,Marcelo Grave,Claudio Pinhanez

2020 International Joint Conference on Neural Networks (IJCNN)（2020）

引用 0|浏览27

暂无评分

摘要

We present a methodology to use Twitter posts to create a parallel corpus which can be used to train Seq2Seq neural networks for a tone rephrasing task. Given that people tend to post texts expressing opinions or emotions of varied intensities regarding given real-world events, the main idea is to create corpus containing pairs of posts with opposite tone but about the same topic. By doing so we overcome the main limitation of current tone rephrasing methods: the lack of appropriate parallel training corpora. We explore different methods to create the datasets, including some which require some level of manual labelling. The results show that a completely automatic generation from Twitter data yields training datasets which are better than those with manual interventions, and good enough for Seq2Seq models to outperform non-Seq2Seq models trained with similar data.

查看译文

关键词

Twitter,Task analysis,Tools,Semantics,Training,Buildings

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要