现场可编程门阵列
计算机科学
分解
软件
硬件加速
计算机硬件
对象(语法)
目标检测
嵌入式系统
资源(消歧)
人工神经网络
人工智能
模式识别(心理学)
操作系统
生物
计算机网络
生态学
作者
Mingshuo Liu,Shiyi Luo,Kevin Han,Bo Yuan,Ronald F. DeMara,Yu Bai
标识
DOI:10.1109/asap52443.2021.00020
摘要
The fast development of object detection techniques has attracted attention to developing efficient Deep Neural Networks (DNNs). However, the current state-of-the-art DNN models can not provide a balanced solution among accuracy, speed, and model size. This paper proposes an efficient real-time object detection framework on resource-constricted hardware devices through hardware and software co-design. The Tensor Train (TT) decomposition is proposed for compressing the YOLOv5 model. By unitizing the unique characteristics given by the TT decomposition, we develop an efficient hardware accelerator based on FPGA devices. Experimental results show that the proposed method can significantly reduce the model size and improve the execution time.
科研通智能强力驱动
Strongly Powered by AbleSci AI