模板
计算机科学
并行计算
加速
计算
分拆(数论)
算法
计算科学
数学
组合数学
作者
Xin You,Hailong Yang,Zhonghui Jiang,Zhongzhi Luan,Depei Qian
标识
DOI:10.1109/hpcc-dss-smartcity-dependsys53884.2021.00036
摘要
Stencil computation is widely adopted in scientific applications as one of the most significant computation patterns. Although there are various optimizations proposed to accelerate the stencil computation, the low-order stencil still suffers from limited performance on GPU due to its low computation inten-sity. In this paper, we propose the fusion-partition optimization techniques to accelerate the low-order stencil computation and implement an effective code generation framework DRStencil to automatically generate optimized stencil codes with fusion-partition applied. Specifically, we adopt a four-stage optimization workflow such as time-fusion, partition, forward and backward computation. We also propose an auto-tuning method to deter-mine the optimal parameter settings of the generated stencil codes. We evaluate DRStencil with representative low-order stencils on Nvidia P100, V100, and A100 GPUs. Our evaluation results achieve 1.46 x, 1.59 x, and 1.10 x speedup on average for widely used low-order stencils compared to the state-of-the-art implementations on P100, V100, and A100 GPUs, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI