A Reconfigurable Floating-Point Division and Square Root Architecture for High-Precision Softmax

Softmax函数 管道(软件) 师(数学) 计算机科学 还原(数学) 浮点型 减法器 平方根 CMOS芯片 嵌入式系统 计算机硬件 电子工程 工程类 加法器 算法 数学 算术 深度学习 人工智能 几何学 程序设计语言
作者
Xiwei Fang,Yuhan Wang,Lei Chen,Fengwei An
出处
期刊:IEEE Transactions on Circuits and Systems I-regular Papers [Institute of Electrical and Electronics Engineers]
卷期号:: 1-14
标识
DOI:10.1109/tcsi.2024.3524307
摘要

With the advancement of deep learning models, the Softmax function with self-attention has become pervasive in everyday applications. As components of the Softmax function and its inputs, both division and square root operations impact its accuracy. However, these two non-linear operations bring significant area and power consumption for hardware implementation. To address these challenges, this paper proposes a reconfigurable floating-point division and square root (FDSR) architecture that achieves low resource consumption and high accuracy for general-purpose computation. The FDSR enhances the traditional non-restoring algorithm by using shift-registers and optimizing the leading-one detection and shift operations, reducing hardware resource usage while maintaining high accuracy (0.5 ULP). In the mantissa calculation, the division can be converted to a square root operation by simply switching the input to the subtractor through multiplexers. Additionally, a triple-mode reconfigurable iteration unit is introduced, featuring a multi-layer variable pipeline architecture to improve adaptability for different applications. By redesigning the pipeline depth and reusing logical units, the FDSR effectively addresses the issue of lengthy iteration cycles in the non-restoring method. Implementation results using 40nm CMOS technology demonstrate that the proposed design achieves a 76.49% power reduction and a 14.69% area reduction for floating-point division compared to Synopsys Design Ware and an 88.05% power reduction and a 90.57% area reduction for floating-point square root. With 28 nm CMOS technology, the FDSR reduces power consumption by 91.55% and reduces area by 64.39% for floating-point division compared to Synopsys Design Ware. On the FPGA platform, the FDSR significantly reduces hardware resource consumption, achieving an 85.23% reduction for floating-point division and 87.81% for floating-point square root, outperforming state-of-the-art designs.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
搜集达人应助CC采纳,获得30
刚刚
杏仁完成签到,获得积分10
刚刚
1秒前
Twonej应助安心采纳,获得30
1秒前
xlf完成签到 ,获得积分10
1秒前
小轩驳回了英姑应助
1秒前
星辰大海应助jilgy采纳,获得10
1秒前
大个应助香蕉绿草采纳,获得10
2秒前
香蕉觅云应助韩韩采纳,获得10
2秒前
123发布了新的文献求助10
2秒前
CipherSage应助潘潘采纳,获得10
2秒前
broccoli发布了新的文献求助10
2秒前
天天快乐应助中中采纳,获得10
2秒前
量子星尘发布了新的文献求助10
3秒前
3秒前
newnew发布了新的文献求助10
3秒前
小长庚发布了新的文献求助10
4秒前
4秒前
xihe1001完成签到,获得积分10
4秒前
艾瑞克完成签到,获得积分10
5秒前
5秒前
张晓完成签到,获得积分10
6秒前
6秒前
zhangguo发布了新的文献求助10
6秒前
周文凯完成签到,获得积分10
6秒前
Linyu完成签到,获得积分10
7秒前
汉堡包应助明亮的嚣采纳,获得10
7秒前
万能图书馆应助素律采纳,获得10
7秒前
沉默鼠鼠完成签到,获得积分10
7秒前
7秒前
朴二蛋发布了新的文献求助20
8秒前
123完成签到,获得积分10
9秒前
9秒前
互助应助111采纳,获得20
10秒前
10秒前
biopig发布了新的文献求助10
10秒前
锅巴洋芋发布了新的文献求助10
10秒前
健壮代玉好好好完成签到,获得积分10
10秒前
科研通AI6.1应助AAA采纳,获得10
10秒前
ZhouYi完成签到,获得积分0
10秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Relation between chemical structure and local anesthetic action: tertiary alkylamine derivatives of diphenylhydantoin 1000
Signals, Systems, and Signal Processing 610
Discrete-Time Signals and Systems 610
Principles of town planning : translating concepts to applications 500
Iron‐Sulfur Clusters: Biogenesis and Biochemistry 400
Healable Polymer Systems: Fundamentals, Synthesis and Applications 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 纳米技术 有机化学 物理 生物化学 化学工程 计算机科学 复合材料 内科学 催化作用 光电子学 物理化学 电极 冶金 遗传学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 6070383
求助须知:如何正确求助?哪些是违规求助? 7902173
关于积分的说明 16336862
捐赠科研通 5211183
什么是DOI,文献DOI怎么找? 2787252
邀请新用户注册赠送积分活动 1770004
关于科研通互助平台的介绍 1648049