强化学习
计算机科学
反事实思维
可扩展性
利润(经济学)
分布式计算
操作员(生物学)
人工智能
运筹学
工程类
经济
化学
微观经济学
抑制因子
哲学
认识论
基因
转录因子
数据库
生物化学
作者
Heiko Hoppe,Tobias Enders,Quentin Cappart,Maximilian Schiffer
出处
期刊:Cornell University - arXiv
日期:2023-01-01
标识
DOI:10.48550/arxiv.2312.08884
摘要
We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.
科研通智能强力驱动
Strongly Powered by AbleSci AI