马尔可夫决策过程
数学优化
趋同(经济学)
贝尔曼方程
计算机科学
凸函数
计算
价值(数学)
功能(生物学)
正多边形
算法
数学
马尔可夫过程
统计
生物
机器学习
进化生物学
经济增长
经济
几何学
作者
Vineet Goyal,Julien Grand-Clément
出处
期刊:Operations Research
[Institute for Operations Research and the Management Sciences]
日期:2023-03-01
卷期号:71 (2): 517-535
被引量:14
标识
DOI:10.1287/opre.2022.2269
摘要
Markov decision processes (MDPs) are used to model stochastic systems in many applications, but computing good policies becomes hard when the effective horizon become very large. In “A First-Order Approach to Accelerated Value Iteration,” Goyal and Grand-Clément present a connection between value iteration (VI) algorithms and gradient descent methods from convex optimization and use acceleration and momentum to design faster algorithms, with convergence guarantees for the computation of the value function of a fixed policy for reversible MDP instances. The authors provide a lower bound on the convergence properties of any first-order algorithm for solving MDPs, where no algorithm can converge faster than VI. Finally, the authors introduce safe accelerated value iteration (S-AVI), which alternates between accelerated updates and value iteration updates. The algorithm S-AVI is worst-case optimal and retains the theoretical convergence properties of VI while exhibiting strong empirical performances and providing significant speedups when compared with classical approaches for a large test bed of MDP instances.
科研通智能强力驱动
Strongly Powered by AbleSci AI