Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling.

arXiv: Distributed, Parallel, and Cluster Computing(2018)

引用 25|浏览419
暂无评分
摘要
Deep learning systems have become vital tools across many fields, but the increasing model sizes mean that training must be accelerated to maintain such systemsu0027 utility. Current systems like Tensorflow and MXNet focus on one specific parallelization strategy, data parallelism, which requires large training batch sizes in order to scale. We cast the problem of finding the best parallelization strategy as the problem of finding the best tiling to partition tensors with the least overall communication. We propose an algorithm that can find the optimal tiling. Our resulting parallelization solution is a hybrid of data parallelism and model parallelism. We build the SoyBean system that performs automatic parallelization. SoyBean automatically transforms a serial dataflow graph captured by an existing deep learning system frontend into a parallel dataflow graph based on the optimal tiling it has found. Our evaluations show that SoyBean is 1.5x-4x faster than pure data parallelism for AlexNet and VGG. We present this automatic tiling in a new system, SoyBean, that can act as a backend for Tensorflow, MXNet, and others.
更多
查看译文
关键词
tensor tiling,hybrid parallelism,deep learning,unifying data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要