Research on Adaptive Job Shop Scheduling Problems Based on Dueling Double DQN

计算机科学强化学习作业车间调度人工智能调度（生产过程）工作车间机器学习流水车间调度数学优化地铁列车时刻表数学操作系统

作者

Bao An Han,Jianjun Yang

出处

期刊：IEEE Access [Institute of Electrical and Electronics Engineers]
日期：2020-01-01 卷期号：8: 186474-186495 被引量：154

链接

ieee.org ieee.org doaj.orgdoi.org

标识

DOI：10.1109/access.2020.3029868

摘要

Traditional approaches for job shop scheduling problems are ill-suited to deal with complex and changeable production environments due to their limited real-time responsiveness. Based on disjunctive graph dispatching, this work proposes a deep reinforcement learning (DRL) framework, that combines the advantages of real-time response and flexibility of a deep convolutional neural network (CNN) and reinforcement learning (RL), and learns behavior strategies directly according to the input manufacturing states, thus is more appropriate for practical order-oriented manufacturing problems. In this framework, a scheduling process using a disjunction graph is viewed as a multi-stage sequential decision-making problem and a deep CNN is used to approximate the state-action value. The manufacturing states are expressed as multi-channel images and input into the network. Various heuristic rules are used as available actions. By adopting the dueling double Deep Q-network with prioritized replay (DDDQNPR), the RL agent continually interacts with the scheduling environment through trial and error to obtain the best policy of combined actions for each decision step. Static computational experiments are performed on 85 JSSP instances from the well-known OR-Library. The results indicate that the proposed algorithm can obtain optimal solutions for small scale problems, and performs better than any single heuristic rule for large scale problems, with performances comparable to genetic algorithms. To prove the generalization and robustness of our algorithm, the instances with random initial states are used as validation sets during training to select the model with the best generalization ability, and then the performance of the trained policy on scheduling instances with different initial states is tested. The results show that the agent is able to get better solutions adaptively. Meanwhile, some studies on dynamic instances with random processing time are performed and experiment results indicate that out method can achieve comparable performances in dynamic environment in the short run.

求助该文献

最长约 10秒，即可获得该文献文件

Research on Adaptive Job Shop Scheduling Problems Based on Dueling Double DQN

今日热心研友