发布文献求助

A First-Order Approach to Accelerated Value Iteration

马尔可夫决策过程数学优化趋同（经济学）贝尔曼方程计算机科学凸函数计算价值（数学）功能（生物学）正多边形算法数学马尔可夫过程统计生物机器学习进化生物学经济增长经济几何学

作者

Vineet Goyal,Julien Grand-Clément

出处

期刊：Operations Research [Institute for Operations Research and the Management Sciences]
日期：2023-03-01 卷期号：71 (2): 517-535 被引量：14

链接

arxiv.org arxiv.orgdoi.org

标识

DOI：10.1287/opre.2022.2269

摘要

Markov decision processes (MDPs) are used to model stochastic systems in many applications, but computing good policies becomes hard when the effective horizon become very large. In “A First-Order Approach to Accelerated Value Iteration,” Goyal and Grand-Clément present a connection between value iteration (VI) algorithms and gradient descent methods from convex optimization and use acceleration and momentum to design faster algorithms, with convergence guarantees for the computation of the value function of a fixed policy for reversible MDP instances. The authors provide a lower bound on the convergence properties of any first-order algorithm for solving MDPs, where no algorithm can converge faster than VI. Finally, the authors introduce safe accelerated value iteration (S-AVI), which alternates between accelerated updates and value iteration updates. The algorithm S-AVI is worst-case optimal and retains the theoretical convergence properties of VI while exhibiting strong empirical performances and providing significant speedups when compared with classical approaches for a large test bed of MDP instances.

求助该文献

最长约 10秒，即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

更新

2025年影响因子查询已上线 (2025-6-18)

更新

PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: 知否完成签到，获得积分0

2秒前; 桐桐的应助被yaoqiangshi采纳，获得10

2秒前; yyyjq完成签到，获得积分10

2秒前; 桑葚草莓冰淇淋发布了新的文献求助10

2秒前; 王女士关注了科研通微信公众号

2秒前; Happyness上传了应助文件

3秒前; FashionBoy上传了应助文件

3秒前; 多背单词完成签到，获得积分10

3秒前; 栗子发布了新的文献求助10

3秒前; 汉堡包的应助被荷欢笙采纳，获得10

3秒前; 一一关注了科研通微信公众号

3秒前; 途中人完成签到，获得积分10

4秒前; lww完成签到，获得积分10

4秒前; ding上传了应助文件

4秒前; yyyjq发布了新的文献求助20

5秒前; 英俊的铭上传了应助文件

5秒前; pinellode完成签到，获得积分10

5秒前; yang完成签到，获得积分10

6秒前; Hunter发布了新的文献求助10

7秒前; Jasper的应助被0ne222采纳，获得10

8秒前; yin关注了科研通微信公众号

9秒前; liuyc发布了新的文献求助10

9秒前; 百灵鸟完成签到，获得积分10

9秒前; 栗子完成签到，获得积分10

9秒前; 龙仔完成签到，获得积分10

9秒前; CAOHOU上传了应助文件

9秒前; shi发布了新的文献求助10

10秒前; Owen上传了应助文件

10秒前; 殷勤的白玉完成签到，获得积分10

10秒前; Zsir发布了新的文献求助10

11秒前; 地表飞猪的应助被Jaychai采纳，获得10

11秒前; 烟城完成签到，获得积分10

11秒前; Unicorn完成签到，获得积分10

11秒前; 桐桐的应助被猪猪hero采纳，获得10

12秒前; CipherSage的应助被苦哈哈采纳，获得10

12秒前; 乐乐上传了应助文件

12秒前; 隐形曼青上传了应助文件

13秒前; 龙仔发布了新的文献求助10

14秒前; 李健的小迷弟上传了应助文件

14秒前; 爆米花上传了应助文件

15秒前

高分求助中: Picture Books with Same-sex Parented Families: Unintentional Censorship 1000; A new approach to the extrapolation of accelerated life test data 1000; ACSM’s Guidelines for Exercise Testing and Prescription, 12th edition 500; Nucleophilic substitution in azasydnone-modified dinitroanisoles 500; 不知道标题是什么 500; Indomethacinのヒトにおける経皮吸収 400; Phylogenetic study of the order Polydesmida (Myriapoda: Diplopoda) 370

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 3978729; 求助须知：如何正确求助？哪些是违规求助？ 3522741; 关于积分的说明 11214658; 捐赠科研通 3260224; 什么是DOI，文献DOI怎么找？ 1799815; 邀请新用户注册赠送积分活动 878676; 科研通“疑难数据库（出版商）”最低求助积分说明 807052

今日热心研友

昏睡的蟠桃

热心市民小红花

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2025 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：941272744【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通