数学
马尔可夫决策过程
马尔可夫链
平均成本
极限(数学)
马尔可夫模型
数学优化
马尔可夫过程
部分可观测马尔可夫决策过程
应用数学
统计
数理经济学
数学分析
经济
新古典经济学
出处
期刊:Journal of Applied Probability
[Cambridge University Press]
日期:1970-12-01
卷期号:7 (3): 649-656
被引量:92
摘要
The semi-Markov decision model is considered under the criterion of long-run average cost. A new criterion, which for any policy considers the limit of the expected cost incurred during the first n transitions divided by the expected length of the first n transitions, is considered. Conditions guaranteeing that an optimal stationary (non-randomized) policy exist are then presented. It is also shown that the above criterion is equivalent to the usual one under certain conditions.
科研通智能强力驱动
Strongly Powered by AbleSci AI