SKFAC: Training Neural Networks with Faster Kronecker-Factored Approximate Curvature

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021(2021)

引用 20|浏览25
暂无评分
摘要
The bottleneck of computation burden limits the widespread use of the 2nd order optimization algorithms for training deep neural networks. In this paper, we present a computationally efficient approximation for natural gradient descent, named Swift Kronecker-Factored Approximate Curvature (SKFAC), which combines Kronecker factorization and a fast low-rank matrix inversion technique. Our research aims at both fully connected and convolutional layers. For the fully connected layers, by utilizing the low-rank property of Kronecker factors of Fisher information matrix, our method only requires inverting a small matrix to approximate the curvature with desirable accuracy. For convolutional layers, we propose a way with two strategies to save computational efforts without affecting the empirical performance by reducing across the spatial dimension or receptive fields of feature maps. Specifically, we propose two effective dimension reduction methods for this purpose: Spatial Subsampling and Reduce Sum. Experimental results of training several deep neural networks on Cifar-10 and ImageNet-1k datasets demonstrate that SKFAC can capture the main curvature and yield comparative performance to K-FAC. The proposed method bridges the wall-clock time gap between the 1st and 2nd order algorithms.
更多
查看译文
关键词
computationally efficient approximation,natural gradient descent,SKFAC,Kronecker factorization,low-rank matrix inversion technique,convolutional layers,fully connected layers,low-rank property,Kronecker factors,Fisher information matrix,effective dimension reduction methods,deep neural networks,main curvature,1st order algorithms,2nd order algorithms,neural network training,faster Kronecker-Factored Approximate Curvature,2nd order optimization algorithms,Swift Kronecker-factored approximate curvature,Cifar-10 dataset,ImageNet-1k dataset,spatial subsampling,reduce sum
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要