Causal inference multi-agent reinforcement learning for traffic signal control

强化学习计算机科学推论特征（语言学）代表（政治）信号（编程语言）人工智能机器学习政治学语言学政治哲学程序设计语言法学

作者

Shantian Yang,Bo Yang,Zheng Zeng,Zhongfeng Kang

出处

期刊：Information Fusion [Elsevier BV]
日期：2023-02-08 卷期号：94: 243-256 被引量：29

链接

forskningsportal.dkdoi.org

标识

DOI：10.1016/j.inffus.2023.02.009

摘要

A primary challenge in multi-agent reinforcement learning for traffic signal control is to produce effective cooperative traffic-signal policies in non-stationary multi-agent traffic environments. However, each agent suffers from its local non-stationary traffic environment caused by the time-varying traffic-signal policies of adjacent agents; At the same time, different agents also produce time-varying traffic-signal policies, which further results in the non-stationarity of the whole traffic environment, so these produced traffic-signal policies may be ineffective. In this work, we propose a Causal Inference Multi-Agent reinforcement learning (CI-MA) algorithm, which can alleviate the non-stationarity of multi-agent traffic environments from both feature representation and optimization, eventually helps to produce effective cooperative traffic-signal policies. Specifically, a Causal-Inference (CI) model is first designed to reason about and tackle the non-stationarity of multi-agent traffic environments by both acquiring feature representation distributions and deriving variational lower bounds (i.e., objective functions); And then, based on the designed CI model, we propose a CI-MA algorithm, in which the feature representations are acquired from the non-stationarity of multi-agent traffic environments at both task level and timestep level, the acquired feature representations are used to produce cooperative traffic-signal policies and Q-values for multiple agents; Finally the corresponding objective functions optimize the whole algorithm from both causal inference and multi-agent reinforcement learning. Experiments are conducted in different non-stationary multi-agent traffic environments. Results show that CI-MA algorithm outperforms other state-of-the-art algorithms, and demonstrate that the proposed algorithm trained in synthetic-traffic environments can be effectively transferred to both synthetic- and real-traffic environments with non-stationarity.

求助该文献

最长约 10秒，即可获得该文献文件

Causal inference multi-agent reinforcement learning for traffic signal control

今日热心研友