精炼(冶金)
钢筋
强化学习
计算机科学
人工智能
心理学
材料科学
社会心理学
冶金
作者
Ce Hao,Catherine Weaver,Chen Tang,Kiyosumi Kawamoto,Masayoshi Tomizuka,Wei Zhan
出处
期刊:IEEE robotics and automation letters
日期:2024-02-21
卷期号:9 (4): 3625-3632
被引量:1
标识
DOI:10.1109/lra.2024.3368231
摘要
Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills , i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI