动态定价
强化学习
马尔可夫决策过程
计算机科学
可扩展性
马尔可夫过程
数学优化
利润(经济学)
收益管理
增强学习
运筹学
人工智能
微观经济学
经济
工程类
数学
收入
会计
数据库
统计
作者
Zengxiang Lei,Satish V. Ukkusuri
标识
DOI:10.1016/j.trb.2023.102848
摘要
Dynamic pricing is a widely applied strategy by ride-hailing companies, such as Uber and Lyft, to match the trip demand with the availability of drivers. Deciding proper pricing policies is challenging and existing reinforcement learning (RL)-based solutions are restricted in solving small-scale problems. In this study, we contribute to RL-based approaches that can address the dynamic pricing problem in real-world-scale ride-hailing systems. We first characterize the dynamic pricing problem with a clear distinction between historical prices and current prices. We then translate our dynamic pricing problem into Markov Decision Process (MDP) and prove the existence of a deterministic stationary optimal policy. Our solutions are based on an off-policy reinforcement learning algorithm called twin-delayed deep determinant policy gradient (TD3) that performs offline learning of the optimal pricing policy using historical data and applies the learned policy to the next time slot, e.g., one week. We enhance TD3 by creating three mechanisms to reduce our model complexity and enhance training effectiveness. Extensive numerical experiments are conducted on both small grid networks (16 zones) and the NYC network (242 zones) to demonstrate the performance of the proposed algorithm. The results show our algorithm can efficiently find the optimal pricing policy for both the small and large networks, and can significantly enhance the platform profit and service efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI