计算机科学
可扩展性
压缩比
架空(工程)
数据压缩
素描
压缩(物理)
趋同(经济学)
数据压缩比
计算机工程
算法
人工智能
图像压缩
图像(数学)
数据库
图像处理
操作系统
工程类
复合材料
经济
汽车工程
材料科学
内燃机
经济增长
作者
Lingfei Dai,Luqi Gong,Zhulin An,Yongjun Xu,Boyu Diao
标识
DOI:10.1016/j.jpdc.2023.104811
摘要
Gradient compression is an effective technique for improving the efficiency of distributed training. However, introducing gradient compression can reduce model accuracy and training efficiency. Furthermore, we also find that using a layer-wise gradient compression algorithm would lead to significant compression and communication overhead, which can negatively impact the scaling efficiency of the distributed training system. To address these issues, we propose a new method called Sketch−Fusion SGD, which leverages the Count-Sketch data structure to enhance the scalability and training speed of distributed deep learning systems. Moreover, our method employs LayerFusion to optimize gradient compression algorithms' scalability and convergence efficiency by formulating an optimal multi-layer fusion strategy without introducing extra hyperparameters. We evaluate our method on a cluster of 16 GPUs and demonstrate that it can improve training efficiency by up to 18.6% without compromising the model's accuracy. In addition, we find that applying our LayerFusion algorithm to other gradient compression methods improved their scalability by up to 2.87×.
科研通智能强力驱动
Strongly Powered by AbleSci AI