计算机科学
趋同(经济学)
符号
网络数据包
点(几何)
算法
骨料(复合)
理论计算机科学
数学
计算机网络
算术
材料科学
几何学
经济
复合材料
经济增长
作者
Hochan Lee,Jaewook Lee,Heewon Kim,Sangheon Pack
出处
期刊:IEEE Transactions on Services Computing
[Institute of Electrical and Electronics Engineers]
日期:2023-09-12
卷期号:16 (6): 4198-4204
被引量:2
标识
DOI:10.1109/tsc.2023.3309318
摘要
In-network aggregation facilitates accelerated distributed deep learning by utilizing a programmable switch to aggregate gradient packets. However, a straggler problem should be addressed to avoid performance degradation in terms of training time. In this paper, we propose a straggler-aware in-network aggregation (SAINA) scheme to mitigate the straggler problem while preventing accuracy degradation. In SAINA, the programmable switch aggregates local gradients of the fastest $k$ workers to exclude stragglers and changes $k$ adaptively to balance the tradeoff between training speed and accuracy. To this end, we design a switch-friendly convergence detection (SFCD) algorithm which detects a convergence point and determines $k$ at the convergence point. SAINA is implemented over a software programmable switch and experimental results show that the accuracy of SAINA can reach a target accuracy up to 2.84x faster than the existing in-network aggregation scheme.
科研通智能强力驱动
Strongly Powered by AbleSci AI