马尔可夫决策过程
计算机科学
强化学习
调度(生产过程)
任务(项目管理)
决策树
部分可观测马尔可夫决策过程
马尔可夫过程
过程(计算)
机器学习
马尔可夫链
任务分析
人工智能
马尔可夫模型
数学优化
工程类
统计
数学
系统工程
操作系统
作者
Paul Rademacher,Kevin Wagner,Leslie N. Smith
标识
DOI:10.1109/ssp53291.2023.10207940
摘要
Due to the generally prohibitive computational requirements of optimal task schedulers, much of the field of task scheduling focuses on designing fast suboptimal algorithms. Since the tree search commonly used by sequencing algorithms such as Branch-and-Bound can naturally be framed as a Markov decision process, designing schedulers using imitation and reinforcement learning is a promising and active area of research. This paper demonstrates how polices can be trained on previously solved scheduling problems and successfully generalize to novel ones. Instead of focusing on policy design, however, this work focuses on designing the Markov decision process observation and reward functions to make learning as effective and efficient as possible. This can be of critical importance when training data is limited or when only simple, fast policies are practical. Various Markov decision process designs are introduced and simulation examples demonstrate the resultant increases in policy performance, even without integration into search algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI