强化学习
计算机科学
集合(抽象数据类型)
极限(数学)
钥匙(锁)
凸组合
正多边形
数学优化
动作(物理)
班级(哲学)
多样性(政治)
人工智能
凸优化
机器学习
数学
程序设计语言
物理
社会学
数学分析
几何学
量子力学
计算机安全
人类学
作者
Sobhan Miryoosefi,Kianté Brantley,Hal Daumé,Miroslav Dudík,Robert E. Schapire
出处
期刊:Cornell University - arXiv
日期:2019-01-01
被引量:23
标识
DOI:10.48550/arxiv.1906.09323
摘要
In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we propose an algorithmic scheme that can handle a wide class of constraints in RL tasks: specifically, any constraints that require expected values of some vector measurements (such as the use of an action) to lie in a convex set. This captures previously studied constraints (such as safety and proximity to an expert), but also enables new classes of constraints (such as diversity). Our approach comes with rigorous theoretical guarantees and only relies on the ability to approximately solve standard RL tasks. As a result, it can be easily adapted to work with any model-free or model-based RL. In our experiments, we show that it matches previous algorithms that enforce safety via constraints, but can also enforce new properties that these algorithms do not incorporate, such as diversity.
科研通智能强力驱动
Strongly Powered by AbleSci AI