Metadata Might Make Language Models Better

arxiv（2022）

引用 0|浏览9

暂无评分

摘要

This paper discusses the benefits of including metadata when training language models on historical collections. Using 19th-century newspapers as a case study, we extend the time-masking approach proposed by Rosin et al., 2022 and compare different strategies for inserting temporal, political and geographical information into a Masked Language Model. After fine-tuning several DistilBERT on enhanced input data, we provide a systematic evaluation of these models on a set of evaluation tasks: pseudo-perplexity, metadata mask-filling and supervised classification. We find that showing relevant metadata to a language model has a beneficial impact and may even produce more robust and fairer models.

查看译文

关键词

language models

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要