期刊:International Journal of Automation and Control [Inderscience Enterprises Ltd.] 日期:2024-01-01卷期号:18 (1): 30-52
标识
DOI:10.1504/ijaac.2024.135093
摘要
Unsafe explorations in the training phase hinder the practical deployment of reinforcement learning (RL) on autonomous robots. Some safe RL methods use safety constraints from prior or external knowledge to reduce or avoid unsafe explorations, but such knowledge is usually unavailable in practice, especially in unknown environments. In this work, we propose a few-shot reasoning-based safe reinforcement learning framework that includes a new few-shot learning method with dynamic support set to reason the safety of unexplored actions and hence guide safer action selection. Additionally, it endows robots with the capability of reverting to previous safe states and reflecting on failures to update the dynamic support set and further improve the accuracy of safety reasoning. Experimental results show that our new few-shot learning method is more accurate, and our proposed framework can significantly reduce the number of failures in the learning phase, especially for long-term autonomy.