We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well or fail to achieve the target test accuracy.
We propose a new low-rank gradient compressor based on power iteration that can
- compress gradients rapidly,
- efficiently aggregate the compressed gradients using all-reduce, and
- achieve test performance on par with SGD.
|PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization|
|Vogels, Thijs; Karinireddy, Sai Praneeth; Jaggi, Martin|
|2019-01-01||Advances In Neural Information Processing Systems 32 (NeurIPS 2019)|