计算机科学
现场可编程门阵列
专用集成电路
嵌入式系统
计算机体系结构
变压器
计算机硬件
计算机工程
电气工程
工程类
电压
作者
Beom Jin Kang,Hae In Lee,Seok Kyu Yoon,Young Chan Kim,Sang Beom Jeong,Seong Jun O,Hyun Kim
标识
DOI:10.1016/j.sysarc.2024.103247
摘要
Recently, transformer-based models have achieved remarkable success in various fields, such as computer vision, speech recognition, and natural language processing. However, transformer models require a substantially higher number of parameters and computational operations than conventional neural networks (e.g., recurrent neural networks, long-short-term memory, and convolutional neural networks). Transformer models are typically processed on graphics processing unit (GPU) platforms specialized for high-performance memory and parallel processing. However, the high power consumption of GPUs poses significant challenges for their deployment in edge device environments with limited battery capacity. To address these issues, research on using field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) to drive transformer models with low power consumption is underway. FPGAs offer a high level of flexibility, whereas ASICs are beneficial for optimizing throughput and power. Therefore, both platforms are highly suitable for efficiently optimizing matrix multiplication operations, constituting a significant portion of transformer models. In addition, FPGAs and ASICs consume less power than GPUs, making them ideal energy-efficient platforms. This study investigates and analyzes the model compression methods, various optimization techniques, and architectures of accelerators related to FPGA- and ASIC-based transformer designs. We expect this study to serve as a valuable guide for hardware research in the transformer field.
科研通智能强力驱动
Strongly Powered by AbleSci AI