NetShield: An in-network architecture against byzantine failures in distributed deep learning

Qingqing Ren,Shuyong Zhu,Lu Lu,Zhiqiang Li, Guangyu Zhao,Yujun Zhang

Computer Networks（2023）

引用 0|浏览6

暂无评分

摘要

There is a growing trend of training deep learning networks on distributed clusters. Unfortunately, distributed deep learning (DDL) is prone to Byzantine failures where some nodes corrupt training by sending malicious gradients to the parameter server (PS). Existing works address this problem by implementing Byzantine defenses on the PS. However, Byzantine defenses come with large computational overhead, seriously affecting the DDL’s training performance. Moreover, malicious gradients are not identified until they are transmitted to the endpoint (PS), which leads to a waste of network resources and a decrease in communication efficiency.

查看译文

关键词

byzantine failures,deep learning,in-network

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要