Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

ICLR, Volume abs/1611.01838, 2017.

Cited by: 281|Bibtex|Views27|Links
EI

Abstract:

This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upo...More

Code:

Data:

Your rating :
0

 

Tags
Comments