Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
international conference on learning representations, 2018.
Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure. The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher late...More
PPT (Upload PPT)