Variance-Reduced Deep Actor-Critic with an Optimally Sub-Sampled Actor Recursion

递归(计算机科学) 差异(会计) 计算机科学 数学 算法 人工智能 数学优化 经济 会计
作者
Lakshmi Mandal,Raghuram Bharadwaj Diddigi,Shalabh Bhatnagar
出处
期刊:IEEE transactions on artificial intelligence [Institute of Electrical and Electronics Engineers]
卷期号:5 (7): 3607-3623 被引量:1
标识
DOI:10.1109/tai.2024.3379109
摘要

Reinforcement Learning (RL) algorithms combined with deep learning architectures have achieved tremendous success in many practical applications. However, the policies obtained by many Deep Reinforcement Learning (DRL) algorithms are seen to suffer from high variance making them less useful in safety-critical applications. In general, it is desirable to have algorithms that give a low iterate-variance while providing a high long-term reward. In this work, we consider the Actor-Critic paradigm, where the critic is responsible for evaluating the policy while the feedback from the critic is used by the actor for updating the policy. The updates of both the critic and the actor in the standard Actor-Critic procedure are run concurrently until convergence. It has been previously observed that updating the actor once after every L > 1 steps of the critic reduces the iterate variance. In this paper, we address the question of what optimal L -value to use in the recursions and propose a data-driven L -update rule that runs concurrently with the actor-critic algorithm with the objective being to minimize the variance of the infinite horizon discounted reward. This update is based on a random search (discrete) parameter optimization procedure that incorporates smoothed functional (SF) estimates. We prove the convergence of our proposed multi-timescale scheme to the optimal L and optimal policy pair. Subsequently, through numerical evaluations on benchmark RL tasks, we demonstrate the advantages of our proposed algorithm over multiple state-of-the-art algorithms in the literature.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

祝大家在新的一年里科研腾飞
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
STTY发布了新的文献求助10
刚刚
乐乐应助东方立轩采纳,获得10
1秒前
赘婿应助javalin采纳,获得10
1秒前
王俊鹏完成签到,获得积分20
1秒前
安静的早晨完成签到,获得积分20
4秒前
满意的惮发布了新的文献求助10
5秒前
112255发布了新的文献求助10
5秒前
小蘑菇应助药小博采纳,获得10
6秒前
7秒前
7秒前
ding应助ju龙哥采纳,获得10
8秒前
不安青牛应助boyeer采纳,获得10
9秒前
科研小白完成签到,获得积分10
9秒前
华仔应助Wang0102采纳,获得10
11秒前
慕青应助科研通管家采纳,获得10
12秒前
老虎皮发布了新的文献求助10
12秒前
隐形曼青应助科研通管家采纳,获得10
12秒前
科研通AI2S应助科研通管家采纳,获得10
12秒前
天天快乐应助科研通管家采纳,获得30
12秒前
科目三应助科研通管家采纳,获得30
12秒前
酷波er应助科研通管家采纳,获得10
12秒前
乐乐应助科研通管家采纳,获得10
12秒前
李健应助科研通管家采纳,获得10
12秒前
无名老大应助科研通管家采纳,获得30
12秒前
汉堡包应助科研通管家采纳,获得10
12秒前
李爱国应助科研通管家采纳,获得10
13秒前
无花果应助科研通管家采纳,获得10
13秒前
赘婿应助科研通管家采纳,获得10
13秒前
英姑应助科研通管家采纳,获得10
13秒前
SciGPT应助科研通管家采纳,获得10
13秒前
bkagyin应助科研通管家采纳,获得30
13秒前
13秒前
13秒前
舒心的南珍完成签到,获得积分10
14秒前
14秒前
小牛同志完成签到,获得积分10
15秒前
满意的惮完成签到,获得积分10
15秒前
16秒前
垃圾二硫自组装纳米粒完成签到,获得积分10
17秒前
18秒前
高分求助中
Востребованный временем 2500
The Three Stars Each: The Astrolabes and Related Texts 1500
Classics in Total Synthesis IV: New Targets, Strategies, Methods 1000
Les Mantodea de Guyane 800
Mantids of the euro-mediterranean area 700
The Oxford Handbook of Educational Psychology 600
有EBL数据库的大佬进 Matrix Mathematics 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 内科学 物理 纳米技术 计算机科学 遗传学 化学工程 基因 复合材料 免疫学 物理化学 细胞生物学 催化作用 病理
热门帖子
关注 科研通微信公众号,转发送积分 3412586
求助须知:如何正确求助?哪些是违规求助? 3015222
关于积分的说明 8869350
捐赠科研通 2702937
什么是DOI,文献DOI怎么找? 1481967
科研通“疑难数据库(出版商)”最低求助积分说明 685102
邀请新用户注册赠送积分活动 679758