强化学习
更安全的
蒙特卡罗树搜索
计算机科学
树(集合论)
趋同(经济学)
人工智能
机器学习
决策树
钢筋
国家(计算机科学)
时差学习
蒙特卡罗方法
工程类
计算机安全
数学
算法
经济增长
结构工程
统计
数学分析
经济
作者
Shuojie Mo,Xiaofei Pei,Chaoxian Wu
出处
期刊:IEEE Transactions on Intelligent Transportation Systems
[Institute of Electrical and Electronics Engineers]
日期:2022-07-01
卷期号:23 (7): 6766-6773
被引量:16
标识
DOI:10.1109/tits.2021.3061627
摘要
Reinforcement learning has gradually demonstrated its decision-making ability in autonomous driving. Reinforcement learning is learning how to map states to actions by interacting with environment so as to maximize the long-term reward. Within limited interactions, the learner will get a suitable driving policy according to the designed reward function. However there will be a lot of unsafe behaviors during training in traditional reinforcement learning. This paper proposes a RL-based method combined with RL agent and Monte Carlo tree search algorithm to reduce unsafe behaviors. The proposed safe reinforcement learning framework mainly consists of two modules: risk state estimation module and safe policy search module. Once the future state will be risky calculated by the risk state estimation module using current state information and the action outputted by the RL agent, the MCTS based safe policy search module will activate to guarantee a safer exploration by adding an additional reward for risk actions. We test the approach in several random overtake scenarios, resulting in faster convergence and safer behaviors compared to traditional reinforcement learning.
科研通智能强力驱动
Strongly Powered by AbleSci AI