强化学习
机器人
计算机科学
控制(管理)
人工智能
软件部署
自治
软件工程
政治学
法学
标识
DOI:10.23919/ccc52363.2021.9549723
摘要
Reinforcement learning (RL) has been widely used for robot autonomy because it can adapt to dynamic or unknown environments by automatically learning optimal control policies from the interactions between robots and environments. However, the practical deployment of RL can endanger the safety of both robots and environments because many RL methods must experience failures during the training phase. These failures can be reduced or avoided by assuming knowing prior knowledge about the states and environments in the training phase, but this assumption is easily invalid in practical applications, especially with unknown environments. In addition, restarting a training episode could be difficult in practice because the robot may be stuck in the failures. To solve these problems, we propose an operational safe control framework that can automatically recover from failures and reduce failure risks without any prior knowledge. Our framework consists of three steps: (1) detect failures and revert to safe actions, (2) collect correction samples to learn a potential that provides internal environment information to robots, (3) use the potential to shape a safe reward that biases safe explorations. A maze navigation example is used to demonstrate that our method outperforms the traditional reinforcement learning with significantly less failures.
科研通智能强力驱动
Strongly Powered by AbleSci AI