A second order regression model shows edge of stability behavior

ICLR 2023(2023)

引用 0|浏览38
暂无评分
摘要
Recent studies of learning algorithms have shown that there is a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the maximum value which allows convergence (edge of stability). We consider a class of predictive models that are quadratic in the parameters, which we call second-order regression models. This is in contrast with the neural tangent kernel regime, where the predictive function is linear in the parameters. For quadratic objectives in two dimensions, we prove that this second order regression model exhibits both progressive sharpening and edge of stability behavior. We then show that in higher dimensions, the model shows this behavior generically without the structure of a neural network, due to a non-linearity induced in the learning dynamics. Finally, we show that edge of stability behavior in neural networks is correlated with the behavior in quadratic regression models.
更多
查看译文
关键词
deep learning theory,non-linear dynamics,optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要