CrossBow: Scaling Deep Learning on Multi-GPU Servers

Alexandros Koliousis,Pijika Watcharapichat,Matthias Weidlich,Paolo Costa,Peter Pietzuch

user-5da93e5d530c70bec9508e2b（2018）

引用 1|浏览5

暂无评分

摘要

With the widespread availability of servers with 4 or more GPUs, scalability in terms of the number of GPUs in a server when training deep learning models becomes a paramount concern. Systems such as TensorFlow and MXNet train using synchronous stochastic gradient descent—an input batch is partitioned across the GPUs, each computing a partial gradient. The gradients are then combined to update the model parameters before proceeding to the next batch. For many deep learning models, this introduces a scalability challenge: to keep multiple GPUs fully utilised, the batch size must be sufficiently large, but a large batch size slows down model convergence due to the less frequent model updates, thus prolonging the time to reach a desired level of accuracy. This paper introduces CrossBow, a new single-server multiGPU deep learning system that avoids the above trade-off. CrossBow trains multiple model replicas concurrently on each GPU, thereby avoiding under-utilisation of GPUs even when the preferred batch size is small. For this, CrossBow must (i) decide on an appropriate number of model replicas per GPU and (ii) employ an efficient and scalable synchronisation scheme within and across GPUs. CrossBow automatically tunes the number of replicas per GPU at runtime to maximise training throughput for a given batch size. We designed a novel synchronisation scheme that eliminates dependencies among model replicas, enabling high throughput and scalability. Our experiments show that CrossBow outperforms TensorFlow on a 4-GPU server by 2.5× with ResNet-32.

查看译文

关键词

Server,Scalability,Crossbow,Throughput (business),Deep learning,Parallel computing,Scheme (programming language),Computer science,Scaling,Artificial intelligence,Multi gpu

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要