The Danish Gigaword Project

Strømberg-Derczynski Leon,Baglini Rebekah,Christiansen Morten H., Ciosici Manuel R.,Dalsgaard Jacob Aarup,Fusaroli Riccardo, Henrichsen Peter Juel,Hvingelby Rasmus,Kirkedal Andreas, Kjeldsen Alex Speed, Ladefoged Claus,Nielsen Finn Årup, Petersen Malte Lau,Rystrøm Jonathan Hvithamar,Varab Daniel

arxiv(2020)

引用 0|浏览5
暂无评分
摘要
Danish is a North Germanic/Scandinavian language spoken primarily in Denmark, a country with a tradition of technological and scientific innovation. However, from a technological perspective, the Danish language has received relatively little attention and, as a result, Danish language technology is hard to develop, in part due to a lack of large or broad-coverage Danish corpora. This paper describes the Danish Gigaword project, which aims to construct a freely-available one billion word corpus of Danish text that represents the breadth of the written language.
更多
查看译文
关键词
danish gigaword project
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要