Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates
arxiv(2024)
摘要
Synchronous federated learning (FL) is a popular paradigm for collaborative
edge learning. It typically involves a set of heterogeneous devices locally
training neural network (NN) models in parallel with periodic centralized
aggregations. As some of the devices may have limited computational resources
and varying availability, FL latency is highly sensitive to stragglers.
Conventional approaches discard incomplete intra-model updates done by
stragglers, alter the amount of local workload and architecture, or resort to
asynchronous settings; which all affect the trained model performance under
tight training latency constraints. In this work, we propose straggler-aware
layer-wise federated learning (SALF) that leverages the optimization procedure
of NNs via backpropagation to update the global model in a layer-wise fashion.
SALF allows stragglers to synchronously convey partial gradients, having each
layer of the global model be updated independently with a different
contributing set of users. We provide a theoretical analysis, establishing
convergence guarantees for the global model under mild assumptions on the
distribution of the participating devices, revealing that SALF converges at the
same asymptotic rate as FL with no timing limitations. This insight is matched
with empirical observations, demonstrating the performance gains of SALF
compared to alternative mechanisms mitigating the device heterogeneity gap in
FL.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要