计算机科学
软件部署
可靠性工程
工程类
操作系统
作者
Xuesong Wang,Jiazhi Zhang,Diyuan Hou,Yuhu Cheng
出处
期刊:IEEE Transactions on Intelligent Transportation Systems
[Institute of Electrical and Electronics Engineers]
日期:2023-07-17
卷期号:24 (12): 14320-14328
被引量:6
标识
DOI:10.1109/tits.2023.3292253
摘要
Safety limits the application of traditional reinforcement learning (RL) methods to autonomous driving. To address the challenge of safe exploration in autonomous driving tasks, a novel safe RL method called Twin Delayed Deep Deterministic Policy Gradient based on Approximate Safe Action (TD3-ASA) is proposed in this paper. In TD3-ASA, the action output by the current policy during the exploration process is modified to obtain an approximate safe action, and then the approximate safe action is utilized to train a safe policy for deployment. TD3-ASA offers several advantages: 1) TD3-ASA is sample efficient and does not need any prior knowledge; 2) TD3-ASA enhances safety both during training and deployment; 3) TD3-ASA introduces an adjustable safety correction factor that enables a tradeoff between exploration and safety. Experimental results conducted on both the MetaDrive and SpeedLimit autonomous driving test platforms demonstrate the effectiveness of TD3-ASA. TD3-ASA exhibits more than triple safety during training on MetaDrive compared to the current state-of-the-art RL method, achieving a high success rate and low deployment risk.
科研通智能强力驱动
Strongly Powered by AbleSci AI