HLTCOE at TREC 2023 NeuCLIR Track
arxiv(2024)
摘要
The HLTCOE team applied PLAID, an mT5 reranker, and document translation to
the TREC 2023 NeuCLIR track. For PLAID we included a variety of models and
training techniques – the English model released with ColBERT v2,
translate-train (TT), Translate Distill (TD) and multilingual
translate-train (MTT). TT trains a ColBERT model with English queries and
passages automatically translated into the document language from the MS-MARCO
v1 collection. This results in three cross-language models for the track, one
per language. MTT creates a single model for all three document languages by
combining the translations of MS-MARCO passages in all three languages into
mixed-language batches. Thus the model learns about matching queries to
passages simultaneously in all languages. Distillation uses scores from the mT5
model over non-English translated document pairs to learn how to score
query-document pairs. The team submitted runs to all NeuCLIR tasks: the CLIR
and MLIR news task as well as the technical documents task.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要