The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding
CoRR(2024)
摘要
The evaluation of English text embeddings has transitioned from evaluating a
handful of datasets to broad coverage across many tasks through benchmarks such
as MTEB. However, this is not the case for multilingual text embeddings due to
a lack of available benchmarks. To address this problem, we introduce the
Scandinavian Embedding Benchmark (SEB). SEB is a comprehensive framework that
enables text embedding evaluation for Scandinavian languages across 24 tasks,
10 subtasks, and 4 task categories. Building on SEB, we evaluate more than 26
models, uncovering significant performance disparities between public and
commercial solutions not previously captured by MTEB. We open-source SEB and
integrate it with MTEB, thus bridging the text embedding evaluation gap for
Scandinavian languages.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要