Improving Cross-lingual Representation for Semantic Retrieval with Code-switching
arxiv(2024)
摘要
Semantic Retrieval (SR) has become an indispensable part of the FAQ system in
the task-oriented question-answering (QA) dialogue scenario. The demands for a
cross-lingual smart-customer-service system for an e-commerce platform or some
particular business conditions have been increasing recently. Most previous
studies exploit cross-lingual pre-trained models (PTMs) for multi-lingual
knowledge retrieval directly, while some others also leverage the continual
pre-training before fine-tuning PTMs on the downstream tasks. However, no
matter which schema is used, the previous work ignores to inform PTMs of some
features of the downstream task, i.e. train their PTMs without providing any
signals related to SR. To this end, in this work, we propose an Alternative
Cross-lingual PTM for SR via code-switching. We are the first to utilize the
code-switching approach for cross-lingual SR. Besides, we introduce the novel
code-switched continual pre-training instead of directly using the PTMs on the
SR tasks. The experimental results show that our proposed approach consistently
outperforms the previous SOTA methods on SR and semantic textual similarity
(STS) tasks with three business corpora and four open datasets in 20+
languages.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要