计算机科学
人工智能
强化学习
马尔可夫决策过程
机器人
稳健性(进化)
模仿
任务(项目管理)
马尔可夫过程
工程类
心理学
统计
系统工程
化学
基因
社会心理学
生物化学
数学
作者
Tianhao Zhang,Yao Lu,Chen Wang,Jinan Sun,Shikun Zhang,Airong Wei,Guangming Xie
出处
期刊:IEEE transactions on neural networks and learning systems
[Institute of Electrical and Electronics Engineers]
日期:2024-03-01
卷期号:35 (3): 4232-4245
被引量:5
标识
DOI:10.1109/tnnls.2022.3202075
摘要
In this article, the pose regulation control problem of a robotic fish is investigated by formulating it as a Markov decision process (MDP). Such a typical task that requires the robot to arrive at the desired position with the desired orientation remains a challenge, since two objectives (position and orientation) may be conflicted during optimization. To handle the challenge, we adopt the sparse reward scheme, i.e., the robot will be rewarded if and only if it completes the pose regulation task. Although deep reinforcement learning (DRL) can achieve such an MDP with sparse rewards, the absence of immediate reward hinders the robot from efficient learning. To this end, we propose a novel imitation learning (IL) method that learns DRL-based policies from demonstrations with inverse reward shaping to overcome the challenge raised by extremely sparse rewards. Moreover, we design a demonstrator to generate various trajectory demonstrations based on one simple example from a nonexpert helper, which greatly reduces the time consumption of collecting robot samples. The simulation results evaluate the effectiveness of our proposed demonstrator and the state-of-the-art (SOTA) performance of our proposed IL method. Furthermore, we deploy the trained IL policy on a physical robotic fish to perform pose regulation in a swimming tank without/with external disturbances. The experimental results verify the effectiveness and robustness of our proposed methods in real world. Therefore, we believe this article is a step forward in the field of biomimetic underwater robot learning.
科研通智能强力驱动
Strongly Powered by AbleSci AI