计算机科学
协处理器
编译程序
隐藏物
嵌入式系统
计算机体系结构
并行计算
计算机硬件
操作系统
作者
SungWon Chung,Jiemi Wang
出处
期刊:IEEE Journal on Emerging and Selected Topics in Circuits and Systems
[Institute of Electrical and Electronics Engineers]
日期:2019-08-12
卷期号:9 (3): 544-561
被引量:5
标识
DOI:10.1109/jetcas.2019.2934929
摘要
Low-profile mobile computing platforms often need to execute a variety of machine learning algorithms with limited memory and processing power. To address this challenge, this work presents Coara, an instruction-level processor acceleration architecture, which efficiently integrates an approximate analog in-memory computing coprocessor for accelerating general machine learning applications by exploiting analog register file cache. The instruction-level acceleration offers true programmability beyond the degree of freedom provided by reconfigurable machine learning accelerators, and also allows the code generation stage of a compiler back-end to control the coprocessor execution and data flow, so that applications do not need highlevel machine learning software frameworks with a large memory footprint. Conventional analog and mixed-signal accelerators suffer from the overhead of frequent data conversion between analog and digital signals. To solve this classical problem, Coara uses an analog register file cache, which interfaces the analog in-memory computing coprocessor with the digital register file of the processor core. As a result, more than 90% of data conversion overhead with ADC and DAC can be eliminated by temporarily storing the result of analog computation in a switched-capacitor analog memory cell until data dependency occurs. Cycle-accurate Verilog RTL model of the proposed architecture is evaluated with 45 nm CMOS technology parameters while executing machine learning benchmark computation codes that are generated by a customized cross-compiler without using machine learning software frameworks.
科研通智能强力驱动
Strongly Powered by AbleSci AI