计算机科学
任务(项目管理)
强化学习
公制(单位)
集合(抽象数据类型)
人工智能
机器人
仿人机器人
机器学习
障碍物
可转让性
样品(材料)
程序设计语言
罗伊特
政治学
经济
色谱法
化学
管理
法学
运营管理
作者
Kevin Frans,Jonathan Ho,Xi Chen,Pieter Abbeel,John Schulman
出处
期刊:Cornell University - arXiv
日期:2017-01-01
被引量:118
标识
DOI:10.48550/arxiv.1710.09767
摘要
We develop a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives---policies that are executed for large numbers of timesteps. Specifically, a set of primitives are shared within a distribution of tasks, and are switched between by task-specific policies. We provide a concrete metric for measuring the strength of such hierarchies, leading to an optimization problem for quickly reaching high reward on unseen tasks. We then present an algorithm to solve this problem end-to-end through the use of any off-the-shelf reinforcement learning method, by repeatedly sampling new tasks and resetting task-specific policies. We successfully discover meaningful motor primitives for the directional movement of four-legged robots, solely by interacting with distributions of mazes. We also demonstrate the transferability of primitives to solve long-timescale sparse-reward obstacle courses, and we enable 3D humanoid robots to robustly walk and crawl with the same policy.
科研通智能强力驱动
Strongly Powered by AbleSci AI