强化学习
约束(计算机辅助设计)
数学优化
一致性(知识库)
计算机科学
启发式
功能(生物学)
数学
人工智能
几何学
进化生物学
生物
作者
Yujie Yang,Yuhang Zhang,Wenjun Zou,Jianyu Chen,Yuming Yin,Shengbo Eben Li
出处
期刊:IEEE Transactions on Automatic Control
[Institute of Electrical and Electronics Engineers]
日期:2023-11-23
卷期号:69 (4): 2713-2720
标识
DOI:10.1109/tac.2023.3336263
摘要
Safety is a critical concern when applying reinforcement learning (RL) to real-world control problems. A widely used method for ensuring safety is to learn a control barrier function with heuristic feasibility labels that come from expert demonstrations [1] or constraint functions [2]. However, their forward invariant sets fall short of the maximum feasible region because of inaccurate labels. This paper proposes an algorithm called feasible region iteration (FRI) that learns the maximum feasible region to generate accurate feasibility labels. The core of FRI is a constraint decay function (CDF), which comes with a self-consistency condition and naturally leads to the constraint Bellman equation. The optimal CDF, which represents the maximum feasible region, is learned through the iteration of feasible region identification and feasible region expansion. Experiment results show that our algorithm achieves near-zero constraint violations and comparable or higher performance than the baselines.
科研通智能强力驱动
Strongly Powered by AbleSci AI