计算机科学
负载平衡(电力)
分布式计算
过程(计算)
算法
消息传递
算法设计
并行计算
数学
几何学
网格
操作系统
作者
Guangyao Zhang,Wei Wan,Junhong Li
摘要
MPI (Message Passing Interface) plays a crucial role in the field of parallel computing. In the Allreduce algorithm of the OpenMPI communication library, there are some issues in handling communication scenarios with a number of processes that is non-power-of-two. The two existing algorithms address this by excluding some processes to achieve a power-of-two process count. However, the consideration factors are too simplistic, resulting in an imbalanced distribution of participating processes on nodes, greatly impacting communication efficiency. To address this problem, the layout of processes on nodes is taken into consideration, and the range of excluded processes is redefined. Both algorithms are subjected to generic load balancing optimizations and adaptations for domestic architectures, resulting in improved load balancing. Experimental results show that, under a communication scale of 16 nodes, the recursive_doubling algorithm achieves performance improvements of up to 30%, while the reduce_scatter_allgather algorithm achieves performance improvements of up to 21%.
科研通智能强力驱动
Strongly Powered by AbleSci AI