多巴胺
强化学习
时差学习
神经科学
心理学
认知心理学
精神运动学习
计算机科学
人工智能
认知
作者
Sham M. Kakade,Peter Dayan
摘要
Substantial data support a temporal difference (TO) model of dopamine (OA) neuron activity in which the cells provide a global error signal for reinforcement learning. However, in certain circumstances, DA activity seems anomalous under the TD model, responding to non-rewarding stimuli. We address these anomalies by suggesting that DA cells multiplex information about reward bonuses, including Sutton's exploration bonuses and Ng et al's non-distorting shaping bonuses. We interpret this additional role for DA in terms of the unconditional attentional and psychomotor effects of dopamine, having the computational role of guiding exploration.
科研通智能强力驱动
Strongly Powered by AbleSci AI