晶体管
材料科学
强化学习
神经形态工程学
突触重量
人工神经网络
计算机科学
峰值时间相关塑性
光电子学
突触可塑性
电气工程
电压
人工智能
工程类
化学
受体
生物化学
作者
Yue Zhou,Yasai Wang,Fuwei Zhuge,Jianmiao Guo,Sijie Ma,Jingli Wang,Zijian Tang,Yi Li,Xiangshui Miao,Yuhui He,Yang Chai
标识
DOI:10.1002/adma.202107754
摘要
Reward-modulated spike-timing-dependent plasticity (R-STDP) is a brain-inspired reinforcement learning (RL) rule, exhibiting potential for decision-making tasks and artificial general intelligence. However, the hardware implementation of the reward-modulation process in R-STDP usually requires complicated Si complementary metal-oxide-semiconductor (CMOS) circuit design that causes high power consumption and large footprint. Here, a design with two synaptic transistors (2T) connected in a parallel structure is experimentally demonstrated. The 2T unit based on WSe2 ferroelectric transistors exhibits reconfigurable polarity behavior, where one channel can be tuned as n-type and the other as p-type due to nonvolatile ferroelectric polarization. In this way, opposite synaptic weight update behaviors with multilevel (>6 bit) conductance states, ultralow nonlinearity (0.56/-1.23), and large Gmax /Gmin ratio of 30 are realized. By applying positive/negative reward to (anti-)STDP component of 2T cell, R-STDP learning rules are realized for training the spiking neural network and demonstrated to solve the classical cart-pole problem, exhibiting a way for realizing low-power (32 pJ per forward process) and highly area-efficient (100 µm2 ) hardware chip for reinforcement learning.
科研通智能强力驱动
Strongly Powered by AbleSci AI