强化学习
计算机科学
机器人
过程(计算)
软件部署
人工智能
趋同(经济学)
国家(计算机科学)
软件工程
算法
经济增长
操作系统
经济
作者
Weiqiang Wang,Xu Zhou,Benlian Xu,Mingli Lu,Yuxin Zhang,Yuhang Gu
标识
DOI:10.23919/ccc55666.2022.9901669
摘要
Reinforcement learning (RL) holds the promise of autonomous robots because it can adapt to dynamic or unknown environments by automatically learning optimal control policies from the interactions between robots and environments. However, the interactions can be unsafe to both robots and environments during the learning phase, which hinders the practical deployment of RL. Some safe RL methods have been proposed to improve the learning safety by using external or prior knowledge to guide safe actions, but it is difficult to assume having this knowledge in practical applications, especially in unknown environments. More importantly, considering failures are unavoidable in practice, current safe RL lacks the capability of recovering to safe states from failures so that the learning cannot be continued and finished. To solve these problems, we propose a safe and self-recoverable reinforcement learning framework that can predict and prohibit other unsafe actions based on known, explored unsafe actions during the exploration process, and can self-recover to a safe state when a failure occurs. The maze navigation simulation results show that our approach can not only significantly reduce the number of failures but also accelerate the convergence of reinforcement learning.
科研通智能强力驱动
Strongly Powered by AbleSci AI