强化学习
符号
比例(比率)
期限(时间)
计算机科学
动态规划
透视图(图形)
人工智能
数学优化
算法
数学
算术
量子力学
物理
作者
Honghao Wei,Zengyan Yang,Xin Liu,Zhiwei Qin,Xiaowei Tang,Lei Ying
标识
DOI:10.1109/tits.2023.3312048
摘要
Existing approaches for vehicle repositioning on large-scale ride-hailing platforms either ignore the spatial-temporal mismatch between supply and demand in real-time or overlook the long-term balance of the system. To account for both, we propose a lookahead repositioning policy in this paper, which is a novel approach to repositioning idle vehicles from both a dynamic system and a long-term performance perspective. Our method consists of two parts; the first part utilizes linear programming (LP) to formulate the nonstationary system as a time-varying, $T$ -step lookahead optimization problem and explicitly models the fraction of drivers who follow repositioning recommendations (called the repositioning rate). The second step is to incorporate a reinforcement learning (RL) method to maximize long-term return based on learned value functions after the $T$ time slots. Extensive studies utilizing a real-world dataset on both small-scale and large-scale simulators show that our method outperforms previous baseline methods and is robust to prediction errors.
科研通智能强力驱动
Strongly Powered by AbleSci AI