Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent.

conference on learning theory(2018)

引用 274|浏览247
暂无评分
摘要
Nesterovu0027s accelerated gradient descent (AGD), an instance of the general family of ``momentum methods,u0027u0027 provably achieves faster convergence rate than gradient descent (GD) in the convex setting. While these methods are widely used in modern nonconvex applications, including training of deep neural networks, whether they are provably superior to GD in the nonconvex setting remains open. This paper studies a simple variant of Nesterovu0027s AGD, and shows that it escapes saddle points and finds a second-order stationary point in (tilde{O}(1/varepsilon^{7/4})) iterations, matching the best known convergence rate, which is faster than the (tilde{O}(1/varepsilon^{2})) iterations required by GD. To the best of our knowledge, this is the first direct acceleration (single-loop) algorithm that is provably faster than GD in general nonconvex setting---all previous nonconvex accelerated algorithms rely on more complex mechanisms such as nested loops and proximal terms. Our analysis is based on two key ideas: (1) the use of a simple Hamiltonian function, inspired by a continuous-time perspective, which AGD monotonically decreases on each step even for nonconvex functions, and (2) a novel framework called improve or localize, which is useful for tracking the long-term behavior of gradient-based optimization algorithms. We believe that these techniques may deepen our understanding of both acceleration algorithms and nonconvex optimization.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要