计算机科学
强化学习
调度(生产过程)
分布式计算
人工智能
数学优化
数学
作者
Lixiang Zhang,Yan Yan,Yaoguang Hu
标识
DOI:10.1016/j.engappai.2024.108699
摘要
Reinforcement learning-based methods have addressed production scheduling problems with flexible processing constraints. However, delayed rewards arise due to the dynamic arrival of jobs and transportation constraints between two successive operations. The flow time of operations can only be determined after processing due to the possibility that the solution for job sequencing may change if new operations are inserted in dynamic environments. Job sequencing is often overlooked in single-agent-based scheduling methods. The lack of information sharing between multiple agents necessitates that researchers manually design reward functions to fit the relationship between optimization objectives and rewards, thereby reducing the accuracy of the learned policies. Thus, this paper proposes a multi-agent-based scheduling optimization framework that facilitates collaboration between the agents of both machines and jobs to address dynamic flexible job-shop scheduling problems (DFJSP) with transportation time constraints. Then, this paper formulates the Partial Observation Markov Decision Process and constructs a reward-sharing mechanism to tackle the delayed reward issue and facilitate policy learning. Finally, we develop an improved multi-agent dueling double deep Q network algorithm to optimize scheduling policy during long-term training. The results show that, compared with the state-of-the-art methods, the proposed method efficiently shortens the weighted flow time under the trained and unseen scenarios. Additionally, the case study results demonstrate its efficiency and responsiveness. It indicates that the proposed method efficiently addresses production scheduling problems with complex constraints, including the insertion of jobs, transportation time constraints, and flexible processing routes.
科研通智能强力驱动
Strongly Powered by AbleSci AI