发布文献求助

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

数学指数函数数学优化算法指数增长应用数学数学分析

作者

Mehrdad Moharrami,Yashaswini Murthy,Arghyadip Roy,R. Srikant

出处

期刊：Mathematics of Operations Research [Institute for Operations Research and the Management Sciences]
日期：2024-03-11

链接

arxiv.org arxiv.orgdoi.org

标识

DOI：10.1287/moor.2022.0139

摘要

We study the risk-sensitive exponential cost Markov decision process (MDP) formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risk-sensitive cost and show that this new cost criterion can be used to approximate the risk-sensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory-based gradient algorithm to minimize the smooth truncated estimation of the risk-sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem. Funding: This work was supported by the Office of Naval Research Global [Grant N0001419-1-2566], the Division of Computer and Network Systems [Grant 21-06801], the Army Research Office [Grant W911NF-19-1-0379], and the Division of Computing and Communication Foundations [Grants 17-04970 and 19-34986].

求助该文献

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

更新

新增更精细的自定义提醒设置 (2026-1-4)

新增

🕒每天60秒读懂世界·精选全球要闻 (2026-1-2)

更新

2025年影响因子查询已上线 (2025-6-18)

新增

PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: 0222完成签到，获得积分20

刚刚; 雨辰完成签到，获得积分10

刚刚; NexusExplorer的应助被科研狗采纳，获得10

1秒前; 思源的应助被典雅的迎波采纳，获得10

1秒前; 下文献完成签到，获得积分10

2秒前; clyde凌丫完成签到，获得积分10

2秒前; 碧蓝的乐荷完成签到，获得积分20

3秒前; 科研通AI6的应助被zjx采纳，获得10

3秒前; 青青发布了新的文献求助10

3秒前; zhabgyyy完成签到，获得积分10

3秒前; 李爱国上传了应助文件

3秒前; 李健的粉丝团团长上传了应助文件

3秒前; 陈BB完成签到，获得积分10

3秒前; 丘比特的应助被清秋夜露白采纳，获得10

4秒前; 量子星尘发布了新的文献求助10

4秒前; mylove上传了应助文件

5秒前; 科研小白完成签到，获得积分10

5秒前; 松尐完成签到，获得积分10

5秒前; seasound完成签到，获得积分10

5秒前; 小满发布了新的文献求助10

6秒前; 楼下太吵了发布了新的文献求助10

6秒前; flsqw发布了新的文献求助10

6秒前; 白鹿完成签到，获得积分20

6秒前; 碧蓝的乐荷发布了新的文献求助10

7秒前; 叶子上传了应助文件

7秒前; Betty完成签到，获得积分10

8秒前; 佩玖发布了新的文献求助10

9秒前; 华仔上传了应助文件

10秒前; 灯灯发布了新的文献求助10

10秒前; 无花果的应助被贝塔贝塔采纳，获得10

11秒前; 一二完成签到，获得积分10

11秒前; 科研通AI2S上传了应助文件

11秒前; 脑洞疼的应助被大气迎天采纳，获得10

11秒前; 好名字发布了新的文献求助10

11秒前; 小飞侠来咯驳回了科研通AI6的应助

11秒前; Yunhan完成签到，获得积分10

11秒前; 英俊的铭上传了应助文件

12秒前; Stella上传了应助文件

12秒前; 沉默的玻璃猪发布了新的文献求助10

12秒前; chen发布了新的文献求助10

12秒前

高分求助中: (应助此贴封号)【重要！！请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000; Basic And Clinical Science Course 2025-2026 3000; Encyclopedia of Agriculture and Food Systems Third Edition 2000; 人脑智能与人工智能 1000; 花の香りの秘密―遺伝子情報から機能性まで 800; Principles of Plasma Discharges and Materials Processing, 3rd Edition 400; Pharmacology for Chemists: Drug Discovery in Context 400

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 5608315; 求助须知：如何正确求助？哪些是违规求助？ 4692918; 关于积分的说明 14876115; 捐赠科研通 4717325; 什么是DOI，文献DOI怎么找？ 2544189; 邀请新用户注册赠送积分活动 1509187; 关于科研通互助平台的介绍 1472836

今日热心研友

在这无人的城堡肆无忌惮的奔跑

胡子拉碴海盐

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2025 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：821889395【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通