Common 7B Language Models Already Possess Strong Math Capabilities

Chen Li,Weiqi Wang, Jingcheng Hu,Yixuan Wei,Nanning Zheng,Han Hu,Zheng Zhang,Houwen Peng

arxiv（2024）

引用 0|浏览25

暂无评分

摘要

Mathematical capabilities were previously believed to emerge in common language models only at a very large scale or require extensive math-related pre-training. This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities, as evidenced by its impressive accuracy of 97.7 respectively, when selecting the best response from 256 random generations. The primary issue with the current base model is the difficulty in consistently eliciting its inherent mathematical capabilities. Notably, the accuracy for the first answer drops to 49.5 respectively. We find that simply scaling up the SFT data can significantly enhance the reliability of generating correct answers. However, the potential for extensive scaling is constrained by the scarcity of publicly available math questions. To overcome this limitation, we employ synthetic data, which proves to be nearly as effective as real data and shows no clear saturation when scaled up to approximately one million samples. This straightforward approach achieves an accuracy of 82.6 models, surpassing previous models by 14.2 provide insights into scaling behaviors across different reasoning complexities and error types.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要