算法
计算机科学
收敛速度
趋同(经济学)
近端梯度法
梯度下降
随机梯度下降算法
数学优化
数学
人工智能
人工神经网络
计算机网络
经济增长
频道(广播)
经济
作者
Sarit Khirirat,Xiaoyu Wang,Sindri Magnússon,Mikael Johansson
标识
DOI:10.1109/tsp.2023.3237392
摘要
Noisy gradient algorithms have emerged as one of the most popular algorithms for distributed optimization with massive data. Choosing proper step-size schedules is an important task to tune in the algorithms for good performance. For the algorithms to attain fast convergence and high accuracy, it is intuitive to use large step-sizes in the initial iterations when the gradient noise is typically small compared to the algorithm-steps, and reduce the step-sizes as the algorithm progresses. This intuition has been confirmed in theory and practice for stochastic gradient descent. However, similar results are lacking for other methods using approximate gradients. This paper shows that the diminishing step-size strategies can indeed be applied for a broad class of noisy gradient algorithms. Our analysis framework is based on two classes of systems that characterize the impact of the step-sizes on the convergence performance of many algorithms. Our results show that such step-size schedules enable these algorithms to enjoy the optimal rate. We exemplify our results on stochastic compression algorithms. Our experiments validate fast convergence of these algorithms with the step decay schedules.
科研通智能强力驱动
Strongly Powered by AbleSci AI