计算机科学
强化学习
作业车间调度
人工智能
调度(生产过程)
工作车间
机器学习
流水车间调度
数学优化
地铁列车时刻表
数学
操作系统
作者
Bao An Han,Jianjun Yang
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2020-01-01
卷期号:8: 186474-186495
被引量:154
标识
DOI:10.1109/access.2020.3029868
摘要
Traditional approaches for job shop scheduling problems are ill-suited to deal with complex and changeable production environments due to their limited real-time responsiveness. Based on disjunctive graph dispatching, this work proposes a deep reinforcement learning (DRL) framework, that combines the advantages of real-time response and flexibility of a deep convolutional neural network (CNN) and reinforcement learning (RL), and learns behavior strategies directly according to the input manufacturing states, thus is more appropriate for practical order-oriented manufacturing problems. In this framework, a scheduling process using a disjunction graph is viewed as a multi-stage sequential decision-making problem and a deep CNN is used to approximate the state-action value. The manufacturing states are expressed as multi-channel images and input into the network. Various heuristic rules are used as available actions. By adopting the dueling double Deep Q-network with prioritized replay (DDDQNPR), the RL agent continually interacts with the scheduling environment through trial and error to obtain the best policy of combined actions for each decision step. Static computational experiments are performed on 85 JSSP instances from the well-known OR-Library. The results indicate that the proposed algorithm can obtain optimal solutions for small scale problems, and performs better than any single heuristic rule for large scale problems, with performances comparable to genetic algorithms. To prove the generalization and robustness of our algorithm, the instances with random initial states are used as validation sets during training to select the model with the best generalization ability, and then the performance of the trained policy on scheduling instances with different initial states is tested. The results show that the agent is able to get better solutions adaptively. Meanwhile, some studies on dynamic instances with random processing time are performed and experiment results indicate that out method can achieve comparable performances in dynamic environment in the short run.
科研通智能强力驱动
Strongly Powered by AbleSci AI