作者
Shaojie Qiao,Nan Han,Jiangtao Huang,Yuzhong Peng,Hongguo Cai,Xiao Qin,Zhengyi Lei
摘要
The ride-hailing behaviors of customers are often impacted by various factors including time, geographic distance between locations and weather conditions, causing imbalance between the supply and demand of on-demand ride-hailing dispatch. An effective on-demand ride-hailing dispatching management approach can dispatch idle vehicles and utilize traffic resources more reasonably, increase drivers' income, and improve customers' satisfaction and experience. In order to overcome the disadvantages in on-demand ride-hailing systems, we propose a three-in-one multi-agent reinforcement learning based online algorithm for ride-hailing demand prediction, called ERPM, which can achieve intelligent prediction in an effective and efficient fashion. ERPM tackles the problem that the training phase of traditional reinforcement learning models is difficult to converge due to the high dimensions of input and output data after partitioning the areas that provide platform services into grids, and uses the Actor–Critic strategy to perform on-demand ride-hailing dispatching actions, which are evaluated and optimized to intelligently predict the demand of ride-hailing in the grid areas. In addition, ERPM achieves intelligent parameter update by applying newly designed loss function, learning rate and optimization algorithm, and design an accurate on-demand ride-hailing prediction algorithm on the basis of maximizing the GMV (gross merchandise volume), i.e., the revenues of all on-demand ride-hailing orders served. Compared with traditional machine learning models, the proposed ERPM model is proved to be capable of capturing more complex features of supply and demand from the high dimensions to obtain higher prediction accuracy. Empirical studies are performed on the real Didi Chuxing data, by evaluating the results from extensive experiments, it can be observed that ERPM achieves the highest accuracy of demand prediction on daily GMV and a higher order response rate than the commonly-used famous methods, i.e., for GMV, ERPM outperforms DQN by 9.7% and the Naive model by 14.8%, for the order response rate of ERPM is 1.4% and 4.1% higher than that of DQN and Naive, MAPE (Mean Absolute Percentage Error) of DQN and Naive is 1.61 and 5.94 times higher than that of ERPM, and R2 of ERPM is improved by 8.70% and 64.8% when compared to that of DQN and Naive, respectively.