Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method

Yohei Tsuji,Kazuki Osawa,Yuichiro Ueno,Akira Naruse,Rio Yokota,Satoshi Matsuoka

International Conference on Parallel Processing（2019）

引用 7|浏览45

暂无评分

摘要

Faster training of deep neural networks is desired to speed up the research and development cycle in deep learning. Distributed deep learning and second-order optimization methods are two different techniques to accelerate the training of deep neural networks. In the previous work, researchers show that an approximated second-order optimization method, called K-FAC, can mitigate each other drawbacks of the two techniques. However, there was no detailed discussion on the performance, which is critical for the usage in practice. In this work, we propose several performance optimization techniques to reduce the overheads of K-FAC and to accelerate the overall training. Applying all performance optimizations, we are able to speed up the training 1.64 times per iteration compared to a baseline. Additional to the performance optimizations, we construct a simple performance model to predict model training performance to help the users to determine whether distributed K-FAC is appropriate or not for their training in terms of wall-time.

查看译文

关键词

deep learning, neural networks, second-order optimization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要