分位数
马尔可夫决策过程
分位数函数
数学优化
马尔可夫链
贝尔曼方程
累积前景理论
马尔可夫过程
数学
计算机科学
计量经济学
累积分布函数
统计
期望效用假设
概率密度函数
作者
Xiaocheng Li,Huaiyang Zhong,Margaret L. Brandeau
出处
期刊:Operations Research
[Institute for Operations Research and the Management Sciences]
日期:2021-11-09
卷期号:70 (3): 1428-1447
被引量:3
标识
DOI:10.1287/opre.2021.2123
摘要
Title: Sequential Decision Making Using Quantiles The goal of a traditional Markov decision process (MDP) is to maximize the expectation of cumulative reward over a finite or infinite horizon. In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward. For example, a physician may want to determine the optimal drug regime for a risk-averse patient with the objective of maximizing the 0.10 quantile of the cumulative reward; this is the cumulative improvement in health that is expected to occur with at least 90% probability for the patient. In “Quantile Markov Decision Processes,” X. Li, H. Zhong, and M. Brandeau provide analytic results to solve the quantile Markov decision process (QMDP) problem. They develop an efficient dynamic programming procedure that finds the optimal QMDP value function for all states and quantiles in one pass. The algorithm also extends to the MDP problem with a conditional value-at-risk objective.
科研通智能强力驱动
Strongly Powered by AbleSci AI