Benchmarking Current State-of-the-Art Transformer Models on Token Level Language Identification and Language Pair Identification

2023 International Conference on Computational Science and Computational Intelligence (CSCI)（2023）

Cited 0|Views1

No score

Abstract

With the rise of internet usage, code-switching, where multiple languages or dialects intermingle, has surged. Traditional linguistic analysis struggles with this mixed text, as they're typically monolingual-focused. This paper delves into two core tasks for analyzing code-switched data: Token Level Language Identification (LID) and our newly proposed Language Pair Identification (LPI). We benchmarked and compared current state-of-art transformer models across both tasks to gauge their applicability to the tasks. Our results showed that state-of-the-art multilingual transformer models could achieve state-of-the-art performance on both tasks. The impressive performance on LPI suggests that this will be the first step to utilizing Language Pair Identification to assist in various facets related to Code-Switched corpora and classification performance.

Translated text

Key words

Language identification,Token Level Analysis,Language Pair Recognition,BERT,Transformer

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined