晶体管
材料科学
强化学习
神经形态工程学
突触重量
人工神经网络
计算机科学
光电子学
拓扑(电路)
电气工程
电压
人工智能
工程类
作者
Yue Zhou,Yasai Wang,Fuwei Zhuge,Jianmiao Guo,Sijie Ma,Jingli Wang,Zijian Tang,Yi Li,Xiangshui Miao,Yuhui He,Yang Chai
标识
DOI:10.1002/adma.202107754
摘要
Reward-modulated spike-timing-dependent plasticity (R-STDP) is a brain-inspired reinforcement learning (RL) rule, exhibiting potential for decision-making tasks and artificial general intelligence. However, the hardware implementation of the reward-modulation process in R-STDP usually requires complicated Si complementary metal-oxide-semiconductor (CMOS) circuit design that causes high power consumption and large footprint. Here, a design with two synaptic transistors (2T) connected in a parallel structure is experimentally demonstrated. The 2T unit based on WSe2 ferroelectric transistors exhibits reconfigurable polarity behavior, where one channel can be tuned as n-type and the other as p-type due to nonvolatile ferroelectric polarization. In this way, opposite synaptic weight update behaviors with multilevel (>6 bit) conductance states, ultralow nonlinearity (0.56/-1.23), and large Gmax /Gmin ratio of 30 are realized. By applying positive/negative reward to (anti-)STDP component of 2T cell, R-STDP learning rules are realized for training the spiking neural network and demonstrated to solve the classical cart-pole problem, exhibiting a way for realizing low-power (32 pJ per forward process) and highly area-efficient (100 µm2 ) hardware chip for reinforcement learning.
科研通智能强力驱动
Strongly Powered by AbleSci AI