计算机科学
并行计算
缓存不经意算法
隐藏物
矩阵乘法
内存层次结构
缓存算法
缓存着色
缓存污染
CPU缓存
乘法(音乐)
集合(抽象数据类型)
仅缓存内存体系结构
结合属性
理论计算机科学
程序设计语言
数学
物理
组合数学
量子
纯数学
量子力学
作者
Leonid Djinevski,Sime Arsenovski,Sasko Ristov,Marjan Gušev
出处
期刊:International Convention on Information and Communication Technology, Electronics and Microelectronics
日期:2013-05-20
卷期号:: 193-198
被引量:10
摘要
Performance of shared memory processors show negative performance impulses (drawbacks) in certain regions for execution of the basic matrix multiplication algorithm. In this paper we continue with analysis of GPU memory hierarchy and corresponding cache memory organization. We give a theoretical analysis why a negative performance impulse appears for specifics problem sizes. The main reason is the cache storage organization, i.e. the negative performance peak appears caused by mapping of matrix elements onto one cache set, instead of using the whole cache. The obtained experimental results prove our theoretical analysis. We also propose a method to avoid situations where performance drawbacks appear.
科研通智能强力驱动
Strongly Powered by AbleSci AI