强化学习
计算机科学
正规化(语言学)
人工智能
最优控制
任务(项目管理)
钥匙(锁)
机器学习
人工神经网络
深度学习
特征工程
数学优化
数学
工程类
计算机安全
系统工程
作者
Chelsea Finn,Sergey Levine,Pieter Abbeel
出处
期刊:International Conference on Machine Learning
日期:2016-06-19
卷期号:: 49-58
被引量:373
摘要
Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI