计算机科学
强化学习
窗口(计算)
布线(电子设计自动化)
钢筋
人工智能
车辆路径问题
机器学习
计算机网络
万维网
心理学
社会心理学
作者
Zefang Zong,Xia Tong,Meng Zheng,Yong Li
出处
期刊:ACM Transactions on Intelligent Systems and Technology
[Association for Computing Machinery]
日期:2024-01-25
卷期号:15 (2): 1-19
摘要
Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: (a) directly optimizing the goal with several practical constraints; (b) efficiently handling individual time-window limits; and (c) modeling the cooperation among the vehicle fleet. In this article, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time-window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to 11.7% compared to other RL baselines and could generate solutions for instances within seconds, while existing heuristic baselines take for minutes as well as maintain the quality of solutions. Moreover, our solution is thoroughly analyzed with meaningful implications due to the real-time response ability.
科研通智能强力驱动
Strongly Powered by AbleSci AI