Extraction of Parallel Sentences

Synthesis lectures on human language technologies(2023)

引用 0|浏览1
暂无评分
摘要
As explained in Chap. 1 and later developed in Chap. 6 , Machine Translation (MT) engines need to be trained with large numbers of parallel sentences or segments. The quantity and diversity of existing parallel text is limited however. This motivates the search for parallel sentences in comparable corpora. By exploring a larger share of the levels of comparability introduced in Sect. 1.2 , a much larger source of multilingual data can be obtained. Strongly comparable corpora such as Wikipedia entries [1, 62] or news text [2] are rife with parallel sentences and have been among the first to be explored.
更多
查看译文
关键词
extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要