Varuna: scalable, low-cost training of massive deep learning models

Sanjith Athlur,Nitika Saran,Muthian Sivathanu,Ramachandran Ramjee,Nipun Kwatra

European Conference on Computer Systems（2022）

引用 20|浏览47

暂无评分

摘要

ABSTRACTSystems for training massive deep learning models (billions of parameters) today assume and require specialized "hyperclusters": hundreds or thousands of GPUs wired with specialized high-bandwidth interconnects such as NV-Link and Infiniband. Besides being expensive, such dependence on hyperclusters and custom high-speed inter-connects limits the size of such clusters, creating (a) scalability limits on job parallelism; (b) resource fragmentation across hyperclusters. In this paper, we present Varuna a new system that enables training massive deep learning models on commodity networking. Varuna makes thrifty use of networking resources and automatically configures the user's training job to efficiently use any given set of resources. Therefore, Varuna is able to leverage "low-priority" VMs that cost about 5x cheaper than dedicated GPUs, thus significantly reducing the cost of training massive models. We demonstrate the efficacy of Varuna by training massive models, including a 200 billion parameter model, on 5x cheaper "spot VMs", while maintaining high training throughput. Varuna improves end-to-end training time for language models like BERT and GPT-2 by up to 18x compared to other model-parallel approaches and up to 26% compared to other pipeline parallel approaches on commodity VMs. The code for Varuna is available at https://github.com/microsoft/varuna.

查看译文

关键词

massive deep learning models,deep learning,scalable,low-cost

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要