随机博弈
马尔可夫决策过程
可见的
数学优化
有限集
马尔可夫过程
最优控制
分段
数学
功能(生物学)
国家(计算机科学)
控制器(灌溉)
马尔可夫链
部分可观测马尔可夫决策过程
计算机科学
应用数学
算法
数理经济学
数学分析
统计
物理
量子力学
进化生物学
农学
生物
作者
Richard D. Smallwood,Edward J. Sondik
出处
期刊:Operations Research
[Institute for Operations Research and the Management Sciences]
日期:1973-10-01
卷期号:21 (5): 1071-1088
被引量:1348
标识
DOI:10.1287/opre.21.5.1071
摘要
This paper formulates the optimal control problem for a class of mathematical models in which the system to be controlled is characterized by a finite-state discrete-time Markov process. The states of this internal process are not directly observable by the controller; rather, he has available a set of observable outputs that are only probabilistically related to the internal state of the system. The formulation is illustrated by a simple machine-maintenance example, and other specific application areas are also discussed. The paper demonstrates that, if there are only a finite number of control intervals remaining, then the optimal payoff function is a piecewise-linear, convex function of the current state probabilities of the internal Markov process. In addition, an algorithm for utilizing this property to calculate the optimal control policy and payoff function for any finite horizon is outlined. These results are illustrated by a numerical example for the machine-maintenance problem.
科研通智能强力驱动
Strongly Powered by AbleSci AI