Directional Smoothness and Gradient Methods: Convergence and Adaptivity
arxiv(2024)
摘要
We develop new sub-optimality bounds for gradient descent (GD) that depend on
the conditioning of the objective along the path of optimization, rather than
on global, worst-case constants. Key to our proofs is directional smoothness, a
measure of gradient variation that we use to develop upper-bounds on the
objective. Minimizing these upper-bounds requires solving implicit equations to
obtain a sequence of strongly adapted step-sizes; we show that these equations
are straightforward to solve for convex quadratics and lead to new guarantees
for two classical step-sizes. For general functions, we prove that the Polyak
step-size and normalized GD obtain fast, path-dependent rates despite using no
knowledge of the directional smoothness. Experiments on logistic regression
show our convergence guarantees are tighter than the classical theory based on
L-smoothness.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要