计算机科学
编码(集合论)
双精度浮点格式
单精度浮点格式
代码生成
加速
架空(工程)
仿射变换
算法
并行计算
发电机(电路理论)
计算机工程
浮点型
集合(抽象数据类型)
数学
程序设计语言
钥匙(锁)
功率(物理)
纯数学
物理
量子力学
计算机安全
作者
Jinchen Xu,Guanghui Song,Bei Zhou,Li Fei,Jiangwei Hao,Jie Zhao
标识
DOI:10.1145/3627535.3638484
摘要
Reducing floating-point (FP) precision is used to trade the quality degradation of a numerical program's output for performance, but this optimization coincides with type casting, whose overhead is undisclosed until a mixed-precision code version is generated. This uncertainty enforces the decoupled implementation of mixed-precision code generation and autotuning in prior work. In this paper, we present a holistic approach called PrecTuner that consolidates the mixed-precision code generator and the autotuner by defining one parameter. This parameter is first initialized by some automatically sampled values and used to generate several code variants, with various loop transformations also taken into account. The generated code variants are next profiled to solve a performance model formulated using the aforementioned parameter, possibly under a pre-defined quality degradation budget. The best-performing value of the defined parameter is finally predicted without evaluating all code variants. Experimental results of the PolyBench benchmarks on CPU demonstrate that PrecTuner outperforms LuIs by 3.28× while achieving smaller errors, and we also validate its effectiveness in optimizing a real-life large-scale application. In addition, PrecTuner also obtains a mean speedup of 1.81× and 1.52×-1.73× over Pluto on single- and multi-core CPU, respectively, and 1.71× over PPCG on GPU.
科研通智能强力驱动
Strongly Powered by AbleSci AI