强化学习
计算机科学
车辆路径问题
基础(拓扑)
布线(电子设计自动化)
自适应路由
人工神经网络
运筹学
增强学习
马尔可夫决策过程
人工智能
数学优化
工程类
静态路由
马尔可夫过程
数学
计算机网络
路由协议
数学分析
统计
作者
Chenhao Zhou,Jingxin Ma,Louis Douge,Ek Peng Chew,Loo Hay Lee
标识
DOI:10.1016/j.cie.2023.109443
摘要
This paper studies a dynamic vehicle routing problem under stochastic demands, drawn from a real-world situation. Specifically, a single courier must accomplish two kinds of tasks: deliveries known at the beginning of the operation and pickups that appear throughout the daily operation with specific patterns. The objective is to maximise the rewards obtained from serving both types of customers during a limited period. Our contribution lies in using the neural network and historical couriers' decisions to learn a base policy that captures human experience for better decision making. The reinforcement learning framework is then used to make the base policy explore new scenarios through simulations and further train the base policy with newly generated data. We show that our approach allows the serving of an average of 12% and 8% more customers under some conditions than the nearest-neighbour policy in high density area and low density area, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI