How Much Data is Sufficient for Neural Transliteration?

2022 International Conference on Asian Language Processing (IALP)(2022)

引用 0|浏览0
暂无评分
摘要
Data size is vital for neural transliteration. But getting lots of transliteration pairs is difficult, especially for low-resource languages. So how much data is sufficient for an often-used neural transliteration model with good performance? Firstly, the Tensor2Tensor (T2T) neural Transformer transliteration model is selected for its good performance in transliteration. Then select all language pair datasets from English to other languages with more than 40k data and freely available from the website (six datasets in total). Afterwards, conduct neural transliteration experiments for the six data sets. According to experimental results, when the training data size is 20k, the accuracy of six tasks is more than 90% of the best accuracy based on all training data, and the accuracy can exceed 0.45. When the training data size is 15k, the accuracy of six tasks is more than 85% of the best accuracy based on all training data. Therefore, 20k training data is likely sufficient for the Tensor2Tensor neural Transformer transliteration model.
更多
查看译文
关键词
Machine transliteration,neural model,data size
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要