对抗制
任务(项目管理)
强化学习
模仿
人气
计算机科学
人工智能
基线(sea)
过程(计算)
功能(生物学)
机器学习
心理学
社会心理学
工程类
海洋学
系统工程
进化生物学
生物
地质学
操作系统
作者
G. M. Xiang,Shaodong Li,Feng Shuang,Fang Gao,Xiaogang Yuan
出处
期刊:IEEE robotics and automation letters
日期:2024-04-01
卷期号:9 (4): 3179-3186
标识
DOI:10.1109/lra.2024.3366023
摘要
Adversarial Inverse Reinforcement Learning (AIRL) has gained popularity as an alternative to supervised imitation learning, addressing the distributional bias issue of the latter. However, it still faces significant challenges in long-horizon tasks due to the lack of effective exploration. In our study, we demonstrate that standard AIRL strategies end exploration prematurely during online reinforcement learning and fail to learn the entire task due to their inability to fully conform to the expert distribution, which is particularly detrimental to real-world robots. To address these challenges, we introduce the SC-AIRL approach. It decomposes long-horizon tasks into logical subtasks which reduces the agent's need for rich exploration. SC-AIRL utilizes expert demonstrations for performing multiple subtasks and shares a single critic and identical reward function across different subtask trainings. Additionally, we incorporate a human intervention mechanism during the subtask learning process to keep exploration from ending prematurely. Our experiments in challenging robot manipulation tasks demonstrate that SC-AIRL outperforms our baselines significantly. Furthermore, we conduct an exploratory experiment and an empirical analysis, emphasizing the potential of the model to manage complex tasks and the advantages of SC-AIRL over the baseline, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI