强化学习
计算机科学
马尔可夫决策过程
调度(生产过程)
人工智能
动作选择
数学优化
代表(政治)
机器学习
马尔可夫过程
数学
政治学
统计
政治
生物
神经科学
法学
感知
作者
Erdong Yuan,Shuli Cheng,Liejun Wang,Shiji Song,Fang Wu
标识
DOI:10.1016/j.asoc.2023.110436
摘要
Deep reinforcement learning (DRL), as a promising technique, is a new approach to solve the job shop scheduling problem (JSSP). Although DRL method is effective for solving JSSP, there are still deficiencies in state representation, action space definition, and reward function design, which make it difficult for the agent to learn effective policy. In this paper, we model JSSP as a Markov decision process (MDP) and design a new state representation using the state features of bidirectional scheduling, which can not only enable the agent to capture more effective state information, improve its decision-making ability, but also effectively avoid the phenomenon of multiple optimal action selections in candidate action set. Invalid action masking (IAM) technique is employed to narrow the search space, which helps the agent avoid exploring suboptimal solutions. We evaluate the performance of the policy model on eight public test datasets: ABZ, FT, ORB, YN, SWV, LA, TA, and DMU. Extensive experimental results show that the proposed method on the whole has better optimization ability than the existing state-of-the-art models and priority dispatching rules.
科研通智能强力驱动
Strongly Powered by AbleSci AI