强化学习
计算机科学
马尔可夫决策过程
数学优化
调度(生产过程)
人工智能
可扩展性
作业车间调度
地铁列车时刻表
背包问题
马尔可夫过程
算法
数学
数据库
统计
操作系统
作者
Luona Wei,Yuning Chen,Ming Chen,Yingwu Chen
标识
DOI:10.1016/j.asoc.2021.107607
摘要
The agile earth observation satellite scheduling problem (AEOSSP) consists of selecting and scheduling a number of tasks from a set of user requests in order to optimize one or multiple criteria. In this paper, we consider a multi-objective version of AEOSSP (called MO-AEOSSP) where the failure rate and the timeliness of scheduled requests are optimized simultaneously. Due to its NP-hardness, traditional iterative problem-tailored heuristic methods are sensitive to problem instances and require massive computational overhead. We thus propose a deep reinforcement learning and parameter transfer based approach (RLPT) to tackle the MO-AEOSSP in a non-iterative manner. RLPT first decomposes the MO-AEOSSP into a number of scalarized sub-problems by a weight sum approach where each sub-problem can be formulated as a Markov Decision Process (MDP). RLPT then applies an encoder–decoder structure neural network (NN) trained by a deep reinforcement learning procedure to producing a high-quality schedule for each sub-problem. The resulting schedules of all scalarized sub-problems form an approximate pareto front for the MO-AEOSSP. Once a NN of a subproblem is trained, RLPT applies a parameter transfer strategy to reducing the training expenses for its neighboring sub-problems. Experimental results on a large set of randomly generated instances show that RLPT outperforms three classical multi-objective evolutionary algorithms (MOEAs) in terms of solution quality, solution distribution and computational efficiency. Results on various-size instances also show that RLPT is highly general and scalable. To the best of our knowledge, this study is the first attempt that applies deep reinforcement learning to a satellite scheduling problem considering multiple objectives.
科研通智能强力驱动
Strongly Powered by AbleSci AI