On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width

Satoki Ishikawa,Ryo Karakida

CoRR(2023)

引用 0|浏览0
暂无评分
摘要
Second-order optimization has been developed to accelerate the training of deep neural networks and it is being applied to increasingly larger-scale models. In this study, towards training on further larger scales, we identify a specific parameterization for second-order optimization that promotes feature learning in a stable manner even if the network width increases significantly. Inspired by a maximal update parameterization, we consider a one-step update of the gradient and reveal the appropriate scales of hyperparameters including random initialization, learning rates, and damping terms. Our approach covers two major second-order optimization algorithms, K-FAC and Shampoo, and we demonstrate that our parameterization achieves higher generalization performance in feature learning. In particular, it enables us to transfer the hyperparameters across models with different widths.
更多
查看译文
关键词
Deep learning,Second-order optimization,K-FAC,Feature learning,Infinite width,Maximum update parameterization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要