Treebanking User-Generated Content: A Proposal for a Unified Representation in Universal Dependencies

LREC(2020)

引用 0|浏览121
暂无评分
摘要
The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this paper is twofold:(1) to provide a short, though comprehensive, overview of such treebanks-based on available literature-along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The main goal of this paper is to provide a common framework for those teams interested in developing similar resources in UD, thus enabling cross-linguistic consistency, which is a principle that has always been in the spirit of UD.
更多
查看译文
关键词
Web, social media, treebanks, Universal Dependencies, annotation guidelines, UGC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要