Exploring Internal Numeracy in Language Models: A Case Study on ALBERT
arxiv(2024)
摘要
It has been found that Transformer-based language models have the ability to
perform basic quantitative reasoning. In this paper, we propose a method for
studying how these models internally represent numerical data, and use our
proposal to analyze the ALBERT family of language models. Specifically, we
extract the learned embeddings these models use to represent tokens that
correspond to numbers and ordinals, and subject these embeddings to Principal
Component Analysis (PCA). PCA results reveal that ALBERT models of different
sizes, trained and initialized separately, consistently learn to use the axes
of greatest variation to represent the approximate ordering of various
numerical concepts. Numerals and their textual counterparts are represented in
separate clusters, but increase along the same direction in 2D space. Our
findings illustrate that language models, trained purely to model text, can
intuit basic mathematical concepts, opening avenues for NLP applications that
intersect with quantitative reasoning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要