Deep reinforcement learning trajectory planning for robotic manipulator based on simulation-efficient training

强化学习 计算机科学 弹道 培训(气象学) 人工智能 机械手 操纵器(设备) 模拟 机器人 物理 天文 气象学
作者
Bin Zhao,Yao Wu,Chengdong Wu,Ruohuai Sun
出处
期刊:Scientific Reports [Springer Nature]
卷期号:15 (1)
标识
DOI:10.1038/s41598-025-93175-2
摘要

The paper proposes a new M2ACD(Multi-Actor-Critic Deep Deterministic Policy Gradient) algorithm to apply trajectory planning of the robotic manipulator in complex environments. First, the paper presents a general inverse kinematics algorithm that transforms the inverse kinematics problem into a general Newton-MP iterative method. The M2ACD algorithm based on multiple actors and critics is structured. The dual-actor network reduces the overestimation of action values, minimizes the correlation between the actor and value networks, and mitigates instability during the actor's selection process caused by excessively high Q-values. The dual-critic network reduces the estimation bias of Q-values, ensuring more reliable action selection and enhancing the stability of Q-value estimation. Secondly, The robotic manipulator's TSR (two-stage reward) strategy is designed and divided into the approach and close. Rewards in the approach phase focuses on safely and efficiently approaching the target, and rewards in the close phase involves final adjustments before contact is made with the target. Thirdly, to solve the position hopping jitter problem in traditional reinforcement learning trajectory planning, the NURBS(Non-Uniform Rational B-Splines) curve is used to smooth the hopping trajectory generated by M2ACD. Finally, the correctness of the M2ACD and the kinematics algorithm is verified by experiments. The M2ACD algorithm demonstrated superior curve smoothing, convergence stability and convergence speed compared to the TD3, DARC and DDPG algorithms. The M2ACD algorithm can be effectively applied to collaborative robots' trajectory planning, establishing a foundation for subsequent research.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
情怀应助天真豪采纳,获得10
刚刚
刚刚
一只咸鱼发布了新的文献求助10
刚刚
刚刚
1秒前
jingmishensi发布了新的文献求助10
1秒前
3秒前
mmmio应助AnjeXi采纳,获得10
3秒前
齐新竹发布了新的文献求助10
4秒前
sssym发布了新的文献求助10
4秒前
4秒前
5秒前
十米发布了新的文献求助10
5秒前
bbrfu发布了新的文献求助10
8秒前
8秒前
爆米花应助aaaaaa采纳,获得10
8秒前
10秒前
缓慢思枫发布了新的文献求助10
13秒前
13秒前
14秒前
15秒前
不倦应助羊六一采纳,获得10
15秒前
虚拟的数据线完成签到,获得积分10
15秒前
15秒前
16秒前
受伤问凝完成签到 ,获得积分10
16秒前
帆帆发布了新的文献求助10
17秒前
kannar完成签到,获得积分10
18秒前
19秒前
红莲墨生发布了新的文献求助10
20秒前
传奇3应助bbrfu采纳,获得10
21秒前
21秒前
花海发布了新的文献求助10
21秒前
七叶树完成签到,获得积分10
21秒前
香蕉觅云应助AnjeXi采纳,获得10
21秒前
吴青完成签到,获得积分10
22秒前
风中少年发布了新的文献求助10
23秒前
TANG发布了新的文献求助10
24秒前
JY完成签到,获得积分20
24秒前
25秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Les Mantodea de Guyane Insecta, Polyneoptera 1000
Structural Load Modelling and Combination for Performance and Safety Evaluation 1000
Conference Record, IAS Annual Meeting 1977 820
England and the Discovery of America, 1481-1620 600
電気学会論文誌D(産業応用部門誌), 141 巻, 11 号 510
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3573869
求助须知:如何正确求助?哪些是违规求助? 3143673
关于积分的说明 9453528
捐赠科研通 2845319
什么是DOI,文献DOI怎么找? 1564178
邀请新用户注册赠送积分活动 732133
科研通“疑难数据库(出版商)”最低求助积分说明 718929