强化学习
激励
计算机科学
弹道
趋同(经济学)
过程(计算)
功能(生物学)
人工智能
理论(学习稳定性)
控制理论(社会学)
数学优化
机器学习
数学
控制(管理)
经济
微观经济学
生物
进化生物学
操作系统
经济增长
物理
天文
作者
Gang Peng,Jin Yang,Xinde Lia,Mohammad Omar Khyam
出处
期刊:IEEE transactions on systems, man, and cybernetics
[Institute of Electrical and Electronics Engineers]
日期:2023-06-01
卷期号:53 (6): 3566-3573
被引量:2
标识
DOI:10.1109/tsmc.2022.3228901
摘要
To improve the efficiency of deep reinforcement learning (DRL)-based methods for robot manipulator trajectory planning in random working environments, we present three dense reward functions. These rewards differ from the traditional sparse reward. First, a posture reward function is proposed to speed up the learning process with a more reasonable trajectory by modeling the distance and direction constraints, which can reduce the blindness of exploration. Second, a stride reward function is proposed to improve the stability of the learning process by modeling the distance and movement distance of joint constraints. Finally, in order to further improve learning efficiency, we are inspired by the cognitive process of human behavior and propose a stage incentive mechanism, including a hard-stage incentive reward function and a soft-stage incentive reward function. Extensive experiments show that the soft-stage incentive reward function is able to improve the convergence rate, get higher mean reward and lower standard deviation after convergence.
科研通智能强力驱动
Strongly Powered by AbleSci AI