并行计算
计算机科学
吞吐量
稀疏矩阵
乘法(音乐)
矩阵乘法
线性代数
多核处理器
库达
基质(化学分析)
内存带宽
计算科学
无线
数学
操作系统
材料科学
组合数学
高斯分布
复合材料
物理
量子
量子力学
几何学
作者
Nathan Bell,Michael Garland
标识
DOI:10.1145/1654059.1654078
摘要
Sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations encounter a broad spectrum of matrices ranging from the regular to the highly irregular. Harnessing the tremendous potential of throughput-oriented processors for sparse operations requires that we expose substantial fine-grained parallelism and impose sufficient regularity on execution paths and memory access patterns. We explore SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes. The techniques we propose are efficient, successfully utilizing large percentages of peak bandwidth. Furthermore, they deliver excellent total throughput, averaging 16 GFLOP/s and 10 GFLOP/s in double precision for structured grid and unstructured mesh matrices, respectively, on a GeForce GTX 285. This is roughly 2.8 times the throughput previously achieved on Cell BE and more than 10 times that of a quad-core Intel Clovertown system.
科研通智能强力驱动
Strongly Powered by AbleSci AI