计算机科学
内存占用
流式处理
管道(软件)
能源消耗
加速度
高效能源利用
并行计算
调度(生产过程)
序列(生物学)
图像处理
实时计算
人工智能
图像(数学)
生物
操作系统
电气工程
物理
工程类
经济
经典力学
程序设计语言
遗传学
运营管理
生态学
作者
Minxuan Zhou,Yunhui Guo,Weihong Xu,Bin Li,Kevin W. Eliceiri,Tajana Rosing
出处
期刊:Design Automation Conference
日期:2021-12-05
被引量:1
标识
DOI:10.1109/dac18074.2021.9586212
摘要
Attention-based machine learning is used to model long-term dependencies in sequential data. Processing these models on long sequences can be prohibitively costly because of the large memory consumption. In this work, we propose MAT, a processing in-memory (PIM) framework, to accelerate long-sequence attention models. MAT adopts a memory-efficient processing flow for attention models to process sub-sequences in a pipeline with much smaller memory footprint. MAT utilizes a reuse-driven data layout and an optimal sample scheduling to optimize the performance of PIM attention. We evaluate the efficiency of MAT on two emerging long-sequence tasks including natural language processing and medical image processing. Our experiments show that MAT is $2.7 \times$ faster and $3.4 \times$ more energy efficient than the state-of-the-art PIM acceleration. As compared to TPU and GPU, MAT is $5.1 \times$ and $16.4 \times$ faster while consuming $27.5 \times$ and $41.0 \times$ less energy.
科研通智能强力驱动
Strongly Powered by AbleSci AI