强化学习
过程(计算)
计算机科学
人工智能
理论(学习稳定性)
水下
避障
趋同(经济学)
机器人
控制(管理)
状态空间
机器人学习
控制工程
机器学习
工程类
移动机器人
海洋学
地质学
统计
数学
经济增长
经济
操作系统
作者
Hai Huang,Tao Jiang,Zongyu Zhang,Yize Sun,Hongde Qin,Xinyang Li,Xu Yang
标识
DOI:10.1016/j.jfranklin.2024.106773
摘要
Autonomous manipulation operations represent the high intelligent coordination from robotic vision and control, it is also a symbol of the advances of robotic intelligence. The limitations of visual sensing and the increasingly complex experimental conditions make autonomous manipulation operations more difficult, particularly for deep reinforcement learning methods, which can enhance robotic control intelligence but require a lot of training process. Due to the high-dimensional continuous state space and continuous action space characteristics of underwater operations, this paper adopts a policy-based reinforcement learning method as the foundational approach. To address the issues of instability and low convergence efficiency in traditional policy-based reinforcement learning algorithms during the learning process, this paper proposes a novel policy learning method. This method adopts the Proximal Policy Optimization algorithm (PPOClip) and optimizes it through an actor-critic network. The aim is to improve the stability and effectiveness of convergence in the learning process. In the underwater training environment, a new reward shaping scheme has been designed to address the issue of reward sparsity during the training process. The manually crafted dense reward function is utilized as attractive and repulsive potential functions for goal manipulation and obstacle avoidance. On the highly complex underwater manipulation and training environment, transferred learning algorithm has been established to reduce the training times and compensate the differences between the simulation and experiment. Simulations and tank experiments have verified the performance of the proposed strategy learning method.
科研通智能强力驱动
Strongly Powered by AbleSci AI