How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?

Anushka Singh,Ananya B. Sai,Raj Dabre,Ratish Puduppully,Anoop Kunchukuttan,Mitesh M Khapra

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2 Short Papers)（2024）

Cited 0|Views1

No score

Abstract

While machine translation evaluation has been studied primarily forhigh-resource languages, there has been a recent interest in evaluation forlow-resource languages due to the increasing availability of data and models.In this paper, we focus on a zero-shot evaluation setting focusing onlow-resource Indian languages, namely Assamese, Kannada, Maithili, and Punjabi.We collect sufficient Multi-Dimensional Quality Metrics (MQM) and DirectAssessment (DA) annotations to create test sets and meta-evaluate a plethora ofautomatic evaluation metrics. We observe that even for learned metrics, whichare known to exhibit zero-shot performance, the Kendall Tau and Pearsoncorrelations with human annotations are only as high as 0.32 and 0.45.Synthetic data approaches show mixed results and overall do not help close thegap by much for these languages. This indicates that there is still a long wayto go for low-resource evaluation.

Translated text

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined