强化学习
计算机科学
自然语言
约束(计算机辅助设计)
集合(抽象数据类型)
功能(生物学)
人工智能
领域(数学分析)
理解力
机器学习
工程类
程序设计语言
数学分析
生物
机械工程
进化生物学
数学
作者
Xingzhou Lou,Junge Zhang,Ziyan Wang,Kaiqi Huang,Yuanyuan Du
出处
期刊:Cornell University - arXiv
日期:2024-01-01
标识
DOI:10.48550/arxiv.2401.07553
摘要
Safe reinforcement learning (RL) agents accomplish given tasks while adhering to specific constraints. Employing constraints expressed via easily-understandable human language offers considerable potential for real-world applications due to its accessibility and non-reliance on domain expertise. Previous safe RL methods with natural language constraints typically adopt a recurrent neural network, which leads to limited capabilities when dealing with various forms of human language input. Furthermore, these methods often require a ground-truth cost function, necessitating domain expertise for the conversion of language constraints into a well-defined cost function that determines constraint violation. To address these issues, we proposes to use pre-trained language models (LM) to facilitate RL agents' comprehension of natural language constraints and allow them to infer costs for safe policy learning. Through the use of pre-trained LMs and the elimination of the need for a ground-truth cost, our method enhances safe policy learning under a diverse set of human-derived free-form natural language constraints. Experiments on grid-world navigation and robot control show that the proposed method can achieve strong performance while adhering to given constraints. The usage of pre-trained LMs allows our method to comprehend complicated constraints and learn safe policies without the need for ground-truth cost at any stage of training or evaluation. Extensive ablation studies are conducted to demonstrate the efficacy of each part of our method.
科研通智能强力驱动
Strongly Powered by AbleSci AI