Softmax函数
管道(软件)
师(数学)
计算机科学
还原(数学)
浮点型
减法器
平方根
CMOS芯片
嵌入式系统
计算机硬件
电子工程
工程类
加法器
算法
数学
算术
深度学习
人工智能
几何学
程序设计语言
作者
Xiwei Fang,Yuhan Wang,Lei Chen,Fengwei An
出处
期刊:IEEE Transactions on Circuits and Systems I-regular Papers
[Institute of Electrical and Electronics Engineers]
日期:2025-01-01
卷期号:: 1-14
标识
DOI:10.1109/tcsi.2024.3524307
摘要
With the advancement of deep learning models, the Softmax function with self-attention has become pervasive in everyday applications. As components of the Softmax function and its inputs, both division and square root operations impact its accuracy. However, these two non-linear operations bring significant area and power consumption for hardware implementation. To address these challenges, this paper proposes a reconfigurable floating-point division and square root (FDSR) architecture that achieves low resource consumption and high accuracy for general-purpose computation. The FDSR enhances the traditional non-restoring algorithm by using shift-registers and optimizing the leading-one detection and shift operations, reducing hardware resource usage while maintaining high accuracy (0.5 ULP). In the mantissa calculation, the division can be converted to a square root operation by simply switching the input to the subtractor through multiplexers. Additionally, a triple-mode reconfigurable iteration unit is introduced, featuring a multi-layer variable pipeline architecture to improve adaptability for different applications. By redesigning the pipeline depth and reusing logical units, the FDSR effectively addresses the issue of lengthy iteration cycles in the non-restoring method. Implementation results using 40nm CMOS technology demonstrate that the proposed design achieves a 76.49% power reduction and a 14.69% area reduction for floating-point division compared to Synopsys Design Ware and an 88.05% power reduction and a 90.57% area reduction for floating-point square root. With 28 nm CMOS technology, the FDSR reduces power consumption by 91.55% and reduces area by 64.39% for floating-point division compared to Synopsys Design Ware. On the FPGA platform, the FDSR significantly reduces hardware resource consumption, achieving an 85.23% reduction for floating-point division and 87.81% for floating-point square root, outperforming state-of-the-art designs.
科研通智能强力驱动
Strongly Powered by AbleSci AI