Scalable and Practical Natural Gradient for Large-Scale Deep Learning

Osawa Kazuki,Tsuji Yohei,Ueno Yuichiro,Naruse Akira,Foo Chuan-Sheng,Yokota Rio

IEEE Transactions on Pattern Analysis and Machine Intelligence（2022）

引用 14|浏览60

暂无评分

摘要

Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size. Previous approaches attempt to address this problem by varying the learning rate and batch size over epochs and layers, or ad hoc modifications of batch normalization. We propose scalable and practical natura...

查看译文

关键词

Training,Computational modeling,Deep learning,Neural networks,Data models,Stochastic processes,Servers

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要