计算机科学
并行计算
快速傅里叶变换
内存层次结构
多核处理器
共享内存
地点
隐藏物
算法
语言学
哲学
作者
Tun Chen,Haipeng Jia,Yunquan Zhang,Kun Li,Zhihao Li,Xiang Zhao,Jun Yao,Chendi Li
标识
DOI:10.1145/3577193.3593735
摘要
The sophisticated hierarchy and shared characteristics of cache in multicore CPU architectures bring challenges to the performance improvement of fundamental algorithms, especially in implementing and optimizing 3D FFT. 3D FFT is a memory-bounded algorithm that contains many highly discretized memory accesses. With the working set scaling, the data locality becomes poor, which is prone to cause serious memory access overhead, especially for high-dimensional data transposition. This paper proposes a 3D FFT optimization framework named OpenFFT. This framework optimizes the memory access of 3D FFT by the following methods, including 1) A novel tiling algorithm, Z-OpenFFT, based on the column-order algorithm for high-dimensional vectorization to improve data locality and eliminate transposition; 2) An efficient search algorithm Section-cache-aware algorithm to optimize the memory access of butterfly network of 1D FFT; 3) A multi-thread allocation model by analyzing the characteristics of cache hierarchy and task size to allocate threads adaptively. Experiments demonstrate that OpenFFT could obtain a more competitive performance than the best configuration of FFTW and ARMPL on ARM CPUs.
科研通智能强力驱动
Strongly Powered by AbleSci AI