Decentralized Learning Made Practical with Client Sampling
arxiv(2023)
摘要
Decentralized learning (DL) leverages edge devices for collaborative model
training while avoiding coordination by a central server. Due to privacy
concerns, DL has become an attractive alternative to centralized learning
schemes since training data never leaves the device. In a round of DL, all
nodes participate in model training and exchange their model with some other
nodes. Performing DL in large-scale heterogeneous networks results in high
communication costs and prolonged round durations due to slow nodes,
effectively inflating the total training time. Furthermore, current DL
algorithms also assume all nodes are available for training and aggregation at
all times, diminishing the practicality of DL. This paper presents Plexus, an
efficient, scalable, and practical DL system. Plexus (1) avoids network-wide
participation by introducing a decentralized peer sampler that selects small
subsets of available nodes that train the model each round and, (2) aggregates
the trained models produced by nodes every round. Plexus is designed to handle
joining and leaving nodes (churn). We extensively evaluate Plexus by
incorporating realistic traces for compute speed, pairwise latency, network
capacity, and availability of edge devices in our experiments. Our experiments
on four common learning tasks empirically show that Plexus reduces
time-to-accuracy by 1.2-8.3x, communication volume by 2.4-15.3x and training
resources needed for convergence by 6.4-370x compared to baseline DL
algorithms.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要