计算机科学
收缩阵列
现场可编程门阵列
矩阵乘法
可扩展性
乘法(音乐)
并行计算
建筑
卷积神经网络
核(代数)
工作量
计算机工程
计算科学
计算机体系结构
计算机硬件
嵌入式系统
人工智能
数学
超大规模集成
组合数学
操作系统
物理
量子
艺术
数据库
视觉艺术
量子力学
作者
Junzhong Shen,Yuran Qiao,You Huang,Mei Wen,Chunyuan Zhang
出处
期刊:Cornell University - arXiv
日期:2018-05-27
被引量:21
标识
DOI:10.1109/iscas.2018.8351474
摘要
Large-scale floating-point matrix multiplication is a fundamental kernel in many scientific and engineering applications. Most existing work only focus on accelerating matrix multiplication on FPGA by adopting a linear systolic array. This paper towards the extension of this architecture by proposing a scalable and highly configurable multi-array architecture. In addition, we propose a work-stealing scheme to ensure the equality in the workload partition among multiple linear arrays. Furthermore, an analytical model is developed to determine the optimal design parameters. Experiments on a real-life convolutional neural network (CNN) show that we can obtain the optimal extension of the linear array architecture.
科研通智能强力驱动
Strongly Powered by AbleSci AI