计算机科学
可扩展性
人工智能
随机梯度下降算法
架空(工程)
瓶颈
人工神经网络
深度学习
机器学习
分布式计算
过程(计算)
利用
计算机工程
操作系统
嵌入式系统
数据库
计算机安全
标识
DOI:10.1016/j.bdr.2021.100272
摘要
With an escalating arms race to adopt machine learning (ML) into diverse application domains, there is an urgent need to efficiently support distributed machine learning (ML) algorithms. As Stochastic Gradient Descent (SGD) is widely adopted in training ML models, the performance bottleneck of distributed ML would be the communication cost to transmit gradients through the network. While a lot of existing studies aim at compressing the gradient so as to reduce the overhead of network communication, they ignore the model structure in the process of compression. As a result, while they could reduce the communication time, they would result in serious computation discontinuity for deep neural networks, which will lower the prediction accuracy. In this paper, we propose LSDDL, a scalable and light-weighted method to boost the training process of deep learning models in shared-nothing environment. The cornerstone of LSDDL lies on the observation that different layers in a neural network have different importance in the process of decompression. To exploit this insight, we devise a sparsification strategy to compress the gradient of deep neural networks which can preserve the structural information of the model. In addition, we provide a series of compression techniques to further reduce the communication overhead and optimize the overall performance. We implement our LSDDL framework in the PyTorch system and encapsulate it as a user friendly API. We validate our proposed techniques by training several real models on a large cluster. Experimental results show that the communication time of LSDDL is up to 5.43 times less than the original SGD without losing much accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI