Scalable and Practical Natural Gradient for Large-Scale Deep Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence(2022)

引用 14|浏览60
暂无评分
摘要
Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size. Previous approaches attempt to address this problem by varying the learning rate and batch size over epochs and layers, or ad hoc modifications of batch normalization. We propose scalable and practical natura...
更多
查看译文
关键词
Training,Computational modeling,Deep learning,Neural networks,Data models,Stochastic processes,Servers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要