计算机科学
并行计算
超级计算机
移植
库达
巨量平行
GPU群集
计算科学
软件
操作系统
作者
A. Gorobets,F. Xavier Trias,A. Oliva
标识
DOI:10.1016/j.compfluid.2013.05.021
摘要
The work is devoted to the development of efficient parallel algorithms for large-scale simulations of incompressible flows on hybrid supercomputers based on massively-parallel accelerators. The governing equations are discretized using a high-order finite-volume scheme for Cartesian staggered meshes with the only restriction that, at least, one direction is periodic. Its "classical" MPI + OpenMP parallel implementation for CPUs was designed to scale till 100,000 CPU cores. The new hybrid algorithm is developed on a base of a multi-level parallel model that exploits several layers of parallelism of a modern hybrid supercomputer. In this model, MPI and OpenMP are used on the first two levels to couple nodes of a supercomputer and to engage its CPU cores. Then, computing accelerators are further used by means of the hardware independent OpenCL computing standard. In this way, the implementation is adapted to a general computing model with central processors and math co-processors. In this paper the work is focused on adapting the basic operations of the algorithm to architectures of Graphics Processing Units (GPU) without considering the multi-GPU communication scheme. Technology of porting the code to OpenCL is described, certain optimization approaches are presented and relevant performance results obtaining up to 80–90 GFLOPS on a GPU accelerator are demonstrated. Moreover, the experience with different GPU architectures is summarized and a comparison based on the particular application is given for AMD and NVIDIA GPUs as well as for CUDA and OpenCL frameworks.
科研通智能强力驱动
Strongly Powered by AbleSci AI