计算机科学
培训(气象学)
人工智能
深度学习
分布式学习
心理学
教育学
物理
气象学
作者
Jiayu Zhang,Sen Cheng,Feng Dong,Ke Chen,Yu Qiao,Zhigang Mao,Jianfei Jiang
标识
DOI:10.1109/mwscas57524.2023.10405843
摘要
Distributed deep learning training nowadays has become an important workload on data center GPU clusters. However, in some cases, the inter-node bandwidth is limited (e.g., 20Gbps) and thus becomes a performance bottleneck for existing deep learning systems to scale deep learning training across multiple nodes. To exploit this insight, we propose a hierarchical communication algorithm combined with Asynchronous SGD and Synchronous SGD named AS-SGD to make full use of both inter-node and intra-node network bandwidth. Moreover, a set of system optimization techniques like quantization and decentralization are applied to further reduce communication costs. Finally, we present a performance evaluation of our algorithm on a 4-node cluster (each node with 8 Nvidia Tesla V100 GPUs). Experiments show that our algorithm achieves up to 4.95X speedup than existing state-of-the-art systems on popular deep learning models and datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI