强化学习
部分可观测马尔可夫决策过程
马尔可夫决策过程
加药
计算机科学
任务(项目管理)
动作(物理)
马尔可夫链
马尔可夫过程
人工智能
机器学习
马尔可夫模型
医学
药理学
数学
物理
统计
经济
管理
量子力学
作者
Sumana Basu,Marc‐André Legault,Adriana Romero-Soriano,Doina Precup
出处
期刊:Cornell University - arXiv
日期:2023-01-01
标识
DOI:10.48550/arxiv.2301.00512
摘要
Drug dosing is an important application of AI, which can be formulated as a Reinforcement Learning (RL) problem. In this paper, we identify two major challenges of using RL for drug dosing: delayed and prolonged effects of administering medications, which break the Markov assumption of the RL framework. We focus on prolongedness and define PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process), a subclass of POMDPs in which the Markov assumption does not hold specifically due to prolonged effects of actions. Motivated by the pharmacology literature, we propose a simple and effective approach to converting drug dosing PAE-POMDPs into MDPs, enabling the use of the existing RL algorithms to solve such problems. We validate the proposed approach on a toy task, and a challenging glucose control task, for which we devise a clinically-inspired reward function. Our results demonstrate that: (1) the proposed method to restore the Markov assumption leads to significant improvements over a vanilla baseline; (2) the approach is competitive with recurrent policies which may inherently capture the prolonged effect of actions; (3) it is remarkably more time and memory efficient than the recurrent baseline and hence more suitable for real-time dosing control systems; and (4) it exhibits favorable qualitative behavior in our policy analysis.
科研通智能强力驱动
Strongly Powered by AbleSci AI