强化学习
计算机科学
随机性
人工智能
最大熵原理
熵(时间箭头)
数学优化
机器学习
数学
统计
物理
量子力学
作者
Lan Wu,Yuanming Wu,Cong Qin,Ye Tian
出处
期刊:Journal of transportation engineering
[American Society of Civil Engineers]
日期:2023-02-01
卷期号:149 (2)
标识
DOI:10.1061/jtepbs.0000774
摘要
Deep reinforcement learning has strong perception and decision-making capabilities that can effectively solve the problem of continuous high-dimensional state-action space and has become the mainstream method in the field of traffic light timing. However, due to model structural defects or different strategic mechanisms of models, most deep reinforcement learning models have problems such as convergence and divergence or poor exploration capabilities. Therefore, this paper proposes a multi-agent Soft Actor–Critic (SAC) for traffic light timing. Multi-agent SAC adds an entropy item to measure the randomness of the strategy in the objective function of traditional reinforcement learning and maximizes the sum of expected reward and entropy item to improve the model’s exploration ability. The system model can learn multiple optimal timing schemes, avoid repeated selection of the same optimal timing scheme and fall into a local optimum or fail to converge. Meanwhile, it abandons low reward value strategies to reduce data storage and sampling complexity, accelerate training, and improve the stability of the system. Comparative experiments show that the method based on multi-agent SAC traffic light timing can solve the existing problems of deep reinforcement learning and improve the efficiency of vehicles passing through in different traffic scenarios.
科研通智能强力驱动
Strongly Powered by AbleSci AI