计算机科学
强化学习
人工智能
调度(生产过程)
作业车间调度
马尔可夫决策过程
图形
工作车间
水准点(测量)
数学优化
机器学习
地铁列车时刻表
马尔可夫过程
流水车间调度
理论计算机科学
数学
操作系统
统计
大地测量学
地理
作者
Kun Lei,Peng Guo,Wenchao Zhao,Yi Wang,Linmao Qian,Xiangyin Meng,Liansheng Tang
标识
DOI:10.1016/j.eswa.2022.117796
摘要
This paper presents an end-to-end deep reinforcement framework to automatically learn a policy for solving a flexible Job-shop scheduling problem (FJSP) using a graph neural network. In the FJSP environment, the reinforcement agent needs to schedule an operation belonging to a job on an eligible machine among a set of compatible machines at each timestep. This means that an agent needs to control multiple actions simultaneously. Such a problem with multi-actions is formulated as a multiple Markov decision process (MMDP). For solving the MMDPs, we propose a multi-pointer graph networks (MPGN) architecture and a training algorithm called multi-Proximal Policy Optimization (multi-PPO) to learn two sub-policies, including a job operation action policy and a machine action policy to assign a job operation to a machine. The MPGN architecture consists of two encoder-decoder components, which define the job operation action policy and the machine action policy for predicting probability distributions over different operations and machines, respectively. We introduce a disjunctive graph representation of FJSP and use a graph neural network to embed the local state encountered during scheduling. The computational experiment results show that the agent can learn a high-quality dispatching policy and outperforms handcrafted heuristic dispatching rules in solution quality and meta-heuristic algorithm in running time. Moreover, the results achieved on random and benchmark instances demonstrate that the learned policies have a good generalization performance on real-world instances and significantly larger scale instances with up to 2000 operations.
科研通智能强力驱动
Strongly Powered by AbleSci AI