Multi-Agent Constrained Policy Optimization for Conflict-Free Management of Connected Autonomous Vehicles at Unsignalized Intersections

强化学习交叉口（航空）计算机科学数学优化约束（计算机辅助设计）动态规划马尔可夫决策过程马尔可夫过程运筹学工程类人工智能运输工程数学算法机械工程统计

作者

Rui Zhao,Yun Li,Fei Gao,Zhenhai Gao,Tianyao Zhang

出处

期刊：IEEE Transactions on Intelligent Transportation Systems [Institute of Electrical and Electronics Engineers]
日期：2023-11-20 卷期号：25 (6): 5374-5388 被引量：3

标识

DOI：10.1109/tits.2023.3331723

摘要

Autonomous Intersection Management (AIM) systems present a new paradigm for conflict-free cooperation of connected autonomous vehicles (CAVs) at road intersections, the aim of which is to eliminate collisions and improve the traffic efficiency and ride comfort. Given the challenges of current centralized coordination methods in balancing high computational efficiency and robust safety assurance, this paper proposes an innovative conflict-free management scheme for CAVs at unsignalized intersections, leveraging safe multi-agent deep reinforcement learning (MADRL). Firstly, we formulate the safe MADRL problem as a constrained Markov game (CMG) and then transform the AIM problem into a CMG by carefully designing state, action, reward, and cost functions. Subsequently, we propose the Multi-Agent Constrained Policy Optimization (MACPO), specifically tailored to solve the CMG problem. MACPO incorporates safety constraints that further restrict the trust region formed by the Kullback-Leibler (KL) divergence, facilitating reinforcement learning policy updates that maximize performance while keeping constraint costs within their limit bounds. This leads us to introduce the MACPO-based AIM Algorithm. Finally, we train an AIM policy and compare its computation time, ride comfort, traffic efficiency, and safety with management schemes based on Model Predictive Control (MPC), Mixed Integer Programming (MIP), and non-safety-aware reinforcement learning. According to the results, compared with the MPC and MIP methods, our method has increased computational efficiency by 65.22 times and 731.52 times respectively, and has improved traffic efficiency by 2.41 times and 1.80 times respectively. In contrast to the non-safety awareness RL methods, our method achieves a zero collision rate for the first time, while also enhancing ride comfort, highlighting the advantages of using MACPO.

求助该文献

Multi-Agent Constrained Policy Optimization for Conflict-Free Management of Connected Autonomous Vehicles at Unsignalized Intersections

今日热心研友