可扩展性
随机梯度下降算法
计算机科学
趋同(经济学)
收敛速度
数学证明
方案(数学)
通信复杂性
稳健性
算法
架空(工程)
理论计算机科学
数学优化
数学
人工智能
人工神经网络
计算机网络
数据库
操作系统
频道(广播)
数学分析
经济
经济增长
程序设计语言
几何学
作者
Shaohuai Shi,Kaiyong Zhao,Qiang Wang,Zhenheng Tang,Xiaowen Chu
标识
DOI:10.24963/ijcai.2019/473
摘要
Gradient sparsification is a promising technique to significantly reduce the communication overhead in decentralized synchronous stochastic gradient descent (S-SGD) algorithms. Yet, many existing gradient sparsification schemes (e.g., Top-k sparsification) have a communication complexity of O(kP), where k is the number of selected gradients by each worker and P is the number of workers. Recently, the gTop-k sparsification scheme has been proposed to reduce the communication complexity from O(kP) to O(k logP), which significantly boosts the system scalability. However, it remains unclear whether the gTop-k sparsification scheme can converge in theory. In this paper, we first provide theoretical proofs on the convergence of the gTop-k scheme for non-convex objective functions under certain analytic assumptions. We then derive the convergence rate of gTop-k S-SGD, which is at the same order as the vanilla mini-batch SGD. Finally, we conduct extensive experiments on different machine learning models and data sets to verify the soundness of the assumptions and theoretical results, and discuss the impact of the compression ratio on the convergence performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI