Do Llamas Work in English? On the Latent Language of Multilingual Transformers
CoRR(2024)
摘要
We ask whether multilingual language models trained on unbalanced,
English-dominated corpora use English as an internal pivot language – a
question of key importance for understanding how language models function and
the origins of linguistic bias. Focusing on the Llama-2 family of transformer
models, our study uses carefully constructed non-English prompts with a unique
correct single-token continuation. From layer to layer, transformers gradually
map an input embedding of the final prompt token to an output embedding from
which next-token probabilities are computed. Tracking intermediate embeddings
through their high-dimensional space reveals three distinct phases, whereby
intermediate embeddings (1) start far away from output token embeddings; (2)
already allow for decoding a semantically correct next token in the middle
layers, but give higher probability to its version in English than in the input
language; (3) finally move into an input-language-specific region of the
embedding space. We cast these results into a conceptual model where the three
phases operate in "input space", "concept space", and "output space",
respectively. Crucially, our evidence suggests that the abstract "concept
space" lies closer to English than to other languages, which may have important
consequences regarding the biases held by multilingual language models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要