An Efficient FPGA-Based Accelerator for Swin Transformer

现场可编程门阵列 计算 计算机科学 Softmax函数 硬件加速 加速 变压器 规范化(社会学) 推论 并行计算 查阅表格 卷积神经网络 计算科学 计算机工程 高效能源利用 计算机硬件 算法 人工智能 电压 电气工程 工程类 社会学 人类学 程序设计语言
作者
Zhiyang Liu,Pengyu Yin,Zhenhua Ren
出处
期刊:Cornell University - arXiv 被引量:3
标识
DOI:10.48550/arxiv.2308.13922
摘要

Since introduced, Swin Transformer has achieved remarkable results in the field of computer vision, it has sparked the need for dedicated hardware accelerators, specifically catering to edge computing demands. For the advantages of flexibility, low power consumption, FPGAs have been widely employed to accelerate the inference of convolutional neural networks (CNNs) and show potential in Transformer-based models. Unlike CNNs, which mainly involve multiply and accumulate (MAC) operations, Transformer involve non-linear computations such as Layer Normalization (LN), Softmax, and GELU. These nonlinear computations do pose challenges for accelerator design. In this paper, to propose an efficient FPGA-based hardware accelerator for Swin Transformer, we focused on using different strategies to deal with these nonlinear calculations and efficiently handling MAC computations to achieve the best acceleration results. We replaced LN with BN, Given that Batch Normalization (BN) can be fused with linear layers during inference to optimize inference efficiency. The modified Swin-T, Swin-S, and Swin-B respectively achieved Top-1 accuracy rates of 80.7%, 82.7%, and 82.8% in ImageNet. Furthermore, We employed strategies for approximate computation to design hardware-friendly architectures for Softmax and GELU computations. We also designed an efficient Matrix Multiplication Unit to handle all linear computations in Swin Transformer. As a conclude, compared with CPU (AMD Ryzen 5700X), our accelerator achieved 1.76x, 1.66x, and 1.25x speedup and achieved 20.45x, 18.60x, and 14.63x energy efficiency (FPS/power consumption) improvement on Swin-T, Swin-S, and Swin-B models, respectively. Compared to GPU (Nvidia RTX 2080 Ti), we achieved 5.05x, 4.42x, and 3.00x energy efficiency improvement respectively. As far as we know, the accelerator we proposed is the fastest FPGA-based accelerator for Swin Transformer.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
充电宝应助海晨采纳,获得10
1秒前
3秒前
3秒前
4秒前
没有神的过往完成签到,获得积分10
7秒前
kinase完成签到 ,获得积分10
7秒前
9秒前
10秒前
Cope发布了新的文献求助30
10秒前
Jane发布了新的文献求助10
11秒前
外向的半芹完成签到,获得积分10
12秒前
周大琳发布了新的文献求助10
12秒前
13秒前
呆瓜完成签到,获得积分10
13秒前
zho发布了新的文献求助10
15秒前
糖糖完成签到,获得积分10
15秒前
糖糖发布了新的文献求助10
20秒前
小田完成签到,获得积分10
21秒前
22秒前
pluto应助Freya采纳,获得50
23秒前
25秒前
嗯哼发布了新的文献求助10
26秒前
Sandstorm发布了新的文献求助10
30秒前
涓涓完成签到,获得积分10
30秒前
34秒前
34秒前
35秒前
40秒前
45秒前
花痴的天菱完成签到,获得积分10
45秒前
45秒前
Hello应助上汤PJ采纳,获得10
48秒前
空格TNT发布了新的文献求助10
51秒前
52秒前
56秒前
华仔应助shibbit采纳,获得10
56秒前
赵剑心发布了新的文献求助20
57秒前
蔡蔡发布了新的文献求助10
58秒前
58秒前
高分求助中
进口的时尚——14世纪东方丝绸与意大利艺术 Imported Fashion:Oriental Silks and Italian Arts in the 14th Century 800
Autoregulatory progressive resistance exercise: linear versus a velocity-based flexible model 550
临床微生物检验问与答 (第二版), 人民卫生出版社, 2014:146 500
Green building development for a sustainable environment with artificial intelligence technology 500
Zeitschrift für Orient-Archäologie 500
The Collected Works of Jeremy Bentham: Rights, Representation, and Reform: Nonsense upon Stilts and Other Writings on the French Revolution 320
Med Surg Certification Review Book: 3 Practice Tests and CMSRN Study Guide for the Medical Surgical (RN-BC) Exam [5th Edition] 300
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 细胞生物学 免疫学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3351004
求助须知:如何正确求助?哪些是违规求助? 2976541
关于积分的说明 8675492
捐赠科研通 2657683
什么是DOI,文献DOI怎么找? 1455204
科研通“疑难数据库(出版商)”最低求助积分说明 673751
邀请新用户注册赠送积分活动 664242