An Expectation Maximization Algorithm for Textual Unit Alignment.

BUCC '11: Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web(2011)

引用 9|浏览23
暂无评分
摘要
The paper presents an Expectation Maximization (EM) algorithm for automatic generation of parallel and quasi-parallel data from any degree of comparable corpora ranging from parallel to weakly comparable. Specifically, we address the problem of extracting related textual units (documents, paragraphs or sentences) relying on the hypothesis that, in a given corpus, certain pairs of translation equivalents are better indicators of a correct textual unit correspondence than other pairs of translation equivalents. We evaluate our method on mixed types of bilingual comparable corpora in six language pairs, obtaining state of the art accuracy figures.
更多
查看译文
关键词
translation equivalent,bilingual comparable corpus,comparable corpus,correct textual unit correspondence,textual unit,Expectation Maximization,art accuracy figure,automatic generation,better indicator,certain pair,Expectation Maximization algorithm,textual unit alignment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要