强化学习
计算机科学
任务(项目管理)
人工智能
利用
机器人
基线(sea)
采样(信号处理)
机器学习
控制(管理)
模仿
德雷福斯技能获得模型
人工神经网络
工程类
计算机视觉
社会心理学
滤波器(信号处理)
海洋学
地质学
经济增长
计算机安全
经济
系统工程
心理学
作者
Guofei Xiang,Jianbo Su
出处
期刊:IEEE transactions on cybernetics
[Institute of Electrical and Electronics Engineers]
日期:2019-11-12
卷期号:51 (2): 1056-1069
被引量:33
标识
DOI:10.1109/tcyb.2019.2949596
摘要
Reinforcement learning (RL) and imitation learning (IL), especially equipped with deep neural networks, have been widely studied for autonomous robotic skill acquisition and control tasks. However, these methods and their extensions require extensive environmental interactions during training, which greatly prevents them from being applied to real-world robots. To alleviate this problem, we present an efficient model-free off-policy actor-critic algorithm for robotic skill acquisition and continuous control, by fusing the task reward with a task-oriented guiding reward, which is formulated by leveraging few and imperfect expert demonstrations. In this framework, the agent can explore the environment more intentionally, thus sampling efficiency can be achieved; moreover, the agent can also exploit the experience more effectively, thereby substantially improved performance can be realized simultaneously. The empirical results on robotic locomotion tasks show that the proposed scheme can lower sample complexity by 2-10 times in contrast with the state-of-the-art baseline deep RL (DRL) algorithms, while achieving performance better than that of the expert. Furthermore, the proposed algorithm achieves significant improvement in both sampling efficiency and asymptotic performance on tasks with sparse and delayed reward, wherein those baseline DRL algorithms struggle to make progress. This takes a substantial step forward to implement these methods to acquire skills autonomously for real robots.
科研通智能强力驱动
Strongly Powered by AbleSci AI