强化学习
计算机科学
人工智能
生成语法
对抗制
深度学习
模仿
功能(生物学)
机器人
机器学习
心理学
社会心理学
进化生物学
生物
作者
Yoshihisa Tsurumine,Yunduan Cui,Kimitoshi Yamazaki,Takamitsu Matsubara
标识
DOI:10.1109/humanoids43949.2019.9034991
摘要
Although deep Reinforcement Learning (RL) has been successfully applied to a variety of tasks, manually designing appropriate reward functions for such complex tasks as robotic cloth manipulation still remains challenging and costly. In this paper, we explore an approach of Generative Adversarial Imitation Learning (GAIL) for robotic cloth manipulation tasks, which allows an agent to learn near-optimal behaviors from expert demonstration and self explorations without explicit reward function design. Based on the recent success of value-function based RL with the discrete action set for robotic cloth manipulation tasks [1], we develop a novel value-function based imitation learning framework, P-GAIL. P-GAIL employs a modified value-function based deep RL, Entropy-maximizing Deep P-Network, that can consider both the smoothness and causal entropy in policy update. After investigating its effectiveness through a toy problem in simulation, P-GAIL is applied to a dual-arm humanoid robot tasked with flipping a handkerchief and successfully learns a policy close to a human demonstration with limited exploration and demonstration. Experimental results suggest both fast and stable imitation learning ability and sample efficiency of P-GAIL in robotic cloth manipulation.
科研通智能强力驱动
Strongly Powered by AbleSci AI